garch modelshome.iitj.ac.in/~parmod/document/garch forecasting model.pdf · notation xiii 1...

505

Upload: others

Post on 29-Sep-2020

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 2: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 3: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH Models

Page 4: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 5: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH ModelsStructure, Statistical Inference

and Financial Applications

Christian Francq

University Lille 3, Lille, France

Jean-Michel Zakoıan

CREST, Paris, and University Lille 3, France

A John Wiley and Sons, Ltd., Publication

Page 6: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

This edition first published 2010 2010 John Wiley & Sons Ltd

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permissionto reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in anyform or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UKCopyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be availablein electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names andproduct names used in this book are trade names, service marks, trademarks or registered trademarks of their respectiveowners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designedto provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understandingthat the publisher is not engaged in rendering professional services. If professional advice or other expert assistance isrequired, the services of a competent professional should be sought.

Library of Congress Cataloguing-in-Publication DataFrancq, Christian.

[Models GARCH. English]GARCH models : structure, statistical inference, and financial applications / Christian Francq, Jean-Michel Zakoian.

p. cm.Includes bibliographical references and index.ISBN 978-0-470-68391-0 (cloth)

1. Finance–Mathematical models. 2. Investments–Mathematical models. I. Zakoian, Jean-Michel. II. Title.HG106.F7213 2010332.01’5195–dc22

2010013116

A catalogue record for this book is available from the British Library

ISBN: 978-0-470-68391-0

Typeset in 9/11pt Times-Roman by Laserwords Private Limited, Chennai, India.Printed and bound in the United Kingdom by Antony Rowe Ltd, Chippenham, Wiltshire.

Page 7: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Contents

Preface xi

Notation xiii

1 Classical Time Series Models and Financial Series 11.1 Stationary Processes 11.2 ARMA and ARIMA Models 31.3 Financial Series 71.4 Random Variance Models 101.5 Bibliographical Notes 121.6 Exercises 12

Part I Univariate GARCH Models 17

2 GARCH(p, q) Processes 192.1 Definitions and Representations 192.2 Stationarity Study 24

2.2.1 The GARCH(1, 1) Case 242.2.2 The General Case 28

2.3 ARCH (∞) Representation∗ 392.3.1 Existence Conditions 392.3.2 ARCH (∞) Representation of a GARCH 422.3.3 Long-Memory ARCH 43

2.4 Properties of the Marginal Distribution 452.4.1 Even-Order Moments 452.4.2 Kurtosis 48

2.5 Autocovariances of the Squares of a GARCH 502.5.1 Positivity of the Autocovariances 502.5.2 The Autocovariances Do Not Always Decrease 512.5.3 Explicit Computation of the Autocovariances of the Squares 52

2.6 Theoretical Predictions 532.7 Bibliographical Notes 572.8 Exercises 58

Page 8: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

vi CONTENTS

3 Mixing* 633.1 Markov Chains with Continuous State Space 633.2 Mixing Properties of GARCH Processes 683.3 Bibliographical Notes 763.4 Exercises 76

4 Temporal Aggregation and Weak GARCH Models 794.1 Temporal Aggregation of GARCH Processes 79

4.1.1 Nontemporal Aggregation of Strong Models 804.1.2 Nonaggregation in the Class of Semi-Strong GARCH Processes 81

4.2 Weak GARCH 824.3 Aggregation of Strong GARCH Processes in the Weak GARCH Class 854.4 Bibliographical Notes 884.5 Exercises 89

Part II Statistical Inference 91

5 Identification 935.1 Autocorrelation Check for White Noise 93

5.1.1 Behavior of the Sample Autocorrelations of a GARCH Process 945.1.2 Portmanteau Tests 975.1.3 Sample Partial Autocorrelations of a GARCH 975.1.4 Numerical Illustrations 98

5.2 Identifying the ARMA Orders of an ARMA-GARCH 1005.2.1 Sample Autocorrelations of an ARMA-GARCH 1015.2.2 Sample Autocorrelations of an ARMA-GARCH Process When the Noise

is Not Symmetrically Distributed 1045.2.3 Identifying the Orders (P,Q) 106

5.3 Identifying the GARCH Orders of an ARMA-GARCH Model 1085.3.1 Corner Method in the GARCH Case 1095.3.2 Applications 109

5.4 Lagrange Multiplier Test for Conditional Homoscedasticity 1115.4.1 General Form of the LM Test 1115.4.2 LM Test for Conditional Homoscedasticity 115

5.5 Application to Real Series 1175.6 Bibliographical Notes 1205.7 Exercises 122

6 Estimating ARCH Models by Least Squares 1276.1 Estimation of ARCH(q) models by Ordinary Least Squares 1276.2 Estimation of ARCH(q) Models by Feasible Generalized Least Squares 1326.3 Estimation by Constrained Ordinary Least Squares 135

6.3.1 Properties of the Constrained OLS Estimator 1356.3.2 Computation of the Constrained OLS Estimator 137

6.4 Bibliographical Notes 1386.5 Exercises 138

7 Estimating GARCH Models by Quasi-Maximum Likelihood 1417.1 Conditional Quasi-Likelihood 141

7.1.1 Asymptotic Properties of the QMLE 1437.1.2 The ARCH(1) Case: Numerical Evaluation of the Asymptotic Variance 147

Page 9: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CONTENTS vii

7.1.3 The Nonstationary ARCH(1) 1487.2 Estimation of ARMA-GARCH Models by Quasi-Maximum Likelihood 1507.3 Application to Real Data 1557.4 Proofs of the Asymptotic Results* 1567.5 Bibliographical Notes 1807.6 Exercises 180

8 Tests Based on the Likelihood 1858.1 Test of the Second-Order Stationarity Assumption 1868.2 Asymptotic Distribution of the QML When θ0 is at the Boundary 187

8.2.1 Computation of the Asymptotic Distribution 1918.3 Significance of the GARCH Coefficients 194

8.3.1 Tests and Rejection Regions 1948.3.2 Modification of the Standard Tests 1968.3.3 Test for the Nullity of One Coefficient 1978.3.4 Conditional Homoscedasticity Tests with ARCH Models 1998.3.5 Asymptotic Comparison of the Tests 201

8.4 Diagnostic Checking with Portmanteau Tests 2048.5 Application: Is the GARCH(1,1) Model Overrepresented? 2048.6 Proofs of the Main Results∗ 2078.7 Bibliographical Notes 2158.8 Exercises 215

9 Optimal Inference and Alternatives to the QMLE* 2199.1 Maximum Likelihood Estimator 219

9.1.1 Asymptotic Behavior 2209.1.2 One-Step Efficient Estimator 2229.1.3 Semiparametric Models and Adaptive Estimators 2239.1.4 Local Asymptotic Normality 226

9.2 Maximum Likelihood Estimator with Misspecified Density 2319.2.1 Condition for the Convergence of θn,h to θ0 2319.2.2 Reparameterization Implying the Convergence of θn,h to θ0 2329.2.3 Choice of Instrumental Density h 2339.2.4 Asymptotic Distribution of θn,h 234

9.3 Alternative Estimation Methods 2369.3.1 Weighted LSE for the ARMA Parameters 2369.3.2 Self-Weighted QMLE 2379.3.3 Lp Estimators 2379.3.4 Least Absolute Value Estimation 2389.3.5 Whittle Estimator 238

9.4 Bibliographical Notes 2399.5 Exercises 239

Part III Extensions and Applications 243

10 Asymmetries 24510.1 Exponential GARCH Model 24610.2 Threshold GARCH Model 25010.3 Asymmetric Power GARCH Model 25610.4 Other Asymmetric GARCH Models 25810.5 A GARCH Model with Contemporaneous Conditional Asymmetry 259

Page 10: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

viii CONTENTS

10.6 Empirical Comparisons of Asymmetric GARCH Formulations 26110.7 Bibliographical Notes 26910.8 Exercises 270

11 Multivariate GARCH Processes 27311.1 Multivariate Stationary Processes 27311.2 Multivariate GARCH Models 275

11.2.1 Diagonal Model 27611.2.2 Vector GARCH Model 27711.2.3 Constant Conditional Correlations Models 27911.2.4 Dynamic Conditional Correlations Models 28111.2.5 BEKK-GARCH Model 28111.2.6 Factor GARCH Models 284

11.3 Stationarity 28611.3.1 Stationarity of VEC and BEKK Models 28611.3.2 Stationarity of the CCC Model 289

11.4 Estimation of the CCC Model 29111.4.1 Identifiability Conditions 29211.4.2 Asymptotic Properties of the QMLE of the CCC-GARCH model 29411.4.3 Proof of the Consistency and the Asymptotic Normality of the QML 296

11.5 Bibliographical Notes 30711.6 Exercises 308

12 Financial Applications 31112.1 Relation between GARCH and Continuous-Time Models 311

12.1.1 Some Properties of Stochastic Differential Equations 31112.1.2 Convergence of Markov Chains to Diffusions 313

12.2 Option Pricing 31912.2.1 Derivatives and Options 31912.2.2 The Black–Scholes Approach 31912.2.3 Historic Volatility and Implied Volatilities 32112.2.4 Option Pricing when the Underlying Process is a GARCH 321

12.3 Value at Risk and Other Risk Measures 32712.3.1 Value at Risk 32712.3.2 Other Risk Measures 33112.3.3 Estimation Methods 334

12.4 Bibliographical Notes 33712.5 Exercises 338

Part IV Appendices 341

A Ergodicity, Martingales, Mixing 343A.1 Ergodicity 343A.2 Martingale Increments 344A.3 Mixing 347

A.3.1 α-Mixing and β-Mixing Coefficients 348A.3.2 Covariance Inequality 349A.3.3 Central Limit Theorem 352

B Autocorrelation and Partial Autocorrelation 353B.1 Partial Autocorrelation 353B.2 Generalized Bartlett Formula for Nonlinear Processes 359

Page 11: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CONTENTS ix

C Solutions to the Exercises 365

D Problems 439

References 473

Index 487

Page 12: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 13: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Preface

Autoregressive conditionally heteroscedastic (ARCH) models were introduced by Engle in anarticle published in Econometrica in the early 1980s (Engle, 1982). The proposed application inthat article focused on macroeconomic data and one could not imagine, at that time, that the mainfield of application for these models would be finance. Since the mid-1980s and the introduction ofgeneralized ARCH (or GARCH) models, these models have become extremely popular among bothacademics and practitioners. GARCH models led to a fundamental change to the approaches usedin finance, through an efficient modeling of volatility (or variability) of the prices of financial assets.In 2003, the Nobel Prize for Economics was jointly awarded to Robert F. Engle and Clive W.J.Granger ‘for methods of analyzing economic time series with time-varying volatility (ARCH)’.

Since the late 1980s, numerous extensions of the initial ARCH models have been published (seeBollerslev, 2008, for a (tentatively) exhaustive list). The aim of the present volume is not to reviewall these models, but rather to provide a panorama, as wide as possible, of current research intothe concepts and methods of this field. Along with their development in econometrics and financejournals, GARCH models and their extensions have given rise to new directions for research inprobability and statistics. Numerous classes of nonlinear time series models have been suggested,but none of them has generated interest comparable to that in GARCH models. The interest of theacademic world in these models is explained by the fact that they are simple enough to be usablein practice, but also rich in theoretical problems, many of them unsolved.

This book is intended primarily for master’s students and junior researchers, in the hope ofattracting them to research in applied mathematics, statistics or econometrics. For experiencedresearchers, this book offers a set of results and references allowing them to move towards oneof the many topics discussed. Finally, this book is aimed at practitioners and users who may belooking for new methods, or may want to learn the mathematical foundations of known methods.

Some parts of the text have been written for readers who are familiar with probability theoryand with time series techniques. To make this book as self-contained as possible, we providedemonstrations of most theoretical results. On first reading, however, many demonstrations canbe omitted. Those sections or chapters that are the most mathematically sophisticated and canbe skipped without loss of continuity are marked with an asterisk. We have illustrated the maintechniques with numerical examples, using real or simulated data. Program codes allowing theexperiments to be reproduced are provided in the text and on the authors’ web pages. In general,we have tried to maintain a balance between theory and applications.

Readers wishing to delve more deeply into the concepts introduced in this book will find alarge collection of exercises along with their solutions. Some of these complement the proofs givenin the text.

The book is organized as follows. Chapter 1 introduces the basics of stationary processes andARMA modeling. The rest of the book is divided into three parts. Part I deals with the standardunivariate GARCH model. The main probabilistic properties (existence of stationary solutions,

Page 14: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

xii PREFACE

representations, properties of autocorrelations) are presented in Chapter 2. Chapter 3 deals withcomplementary properties related to mixing, allowing us to characterize the decay of the timedependence. Chapter 4 is devoted to temporal aggregation: it studies the impact of the observationfrequency on the properties of GARCH processes.

Part II is concerned with statistical inference. We begin in Chapter 5 by studying the problemof identifying an appropriate model a priori . Then we present different estimation methods, startingwith the method of least squares in Chapter 6 which, limited to ARCH, offers the advantage ofsimplicity. The central part of the statistical study is Chapter 7, devoted to the quasi-maximumlikelihood method. For these models, testing the nullity of coefficients is not standard and is thesubject of Chapter 8. Optimality issues are discussed in Chapter 9, as well as alternative estimatorsallowing some of the drawbacks of standard methods to be overcome.

Part III is devoted to extensions and applications of the standard model. In Chapter 10, modelsallowing us to incorporate asymmetric effects in the volatility are discussed. There is no naturalextension of GARCH models for vector series, and many multivariate formulations are presentedin Chapter 11. Without carrying out an exhaustive statistical study, we consider the estimation ofa particular class of models which appears to be of interest for applications. Chapter 12 presentsapplications to finance. We first study the link between GARCH and diffusion processes, whenthe time step between two observations converges to zero. Two applications to finance are thenpresented: risk measurement and the pricing of derivatives.

Appendix A includes the probabilistic properties which are of most importance for the studyof GARCH models. Appendix B contains results on autocorrelations and partial autocorrelations.Appendix C provides solutions to the end-of-chapter exercises. Finally, a set of problems and (inmost cases) their solutions are provided in Appendix D.

For more information, please visit the author’s website http://perso.univ-lille3.fr/∼cfrancq/Christian-Francq/book-GARCH.html.

Page 15: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Notation

General notation:= ‘is defined as’x+, x− max{x, 0},max{−x, 0} (or min{x, 0} in Chapter 10)

Sets and spacesN,Z,Q,R positive integers, integers, rational numbers, real numbersR+ positive real lineRd d-dimensional Euclidean spaceDc complement of the set D ⊂ Rd

[a, b) half-closed interval

MatricesId d-dimensional identity matrixMp,q(R) the set of p × q real matrices

Processesiid independent and identically distributediid (0,1) iid centered with unit variance(Xt ) or (Xt )t∈Z discrete-time process(εt ) GARCH processσ 2t conditional variance or volatility(ηt ) strong white noise with unit varianceκη kurtosis coefficient of ηtL or B lag operatorσ {Xs; s < t} or Xt−1 sigma-field generated by the past of Xt

Functions1lA(x) 1 if x ∈ A, 0 otherwise[x] integer part of xγX, ρX autocovariance and autocorrelation functions of (Xt)

γX, ρX sample autocovariance and autocorrelation

ProbabilityN(m,) Gaussian law with mean m and covariance matrix

χ 2d chi-square distribution with d degrees of freedomχ2d (α) quantile of order α of the χ2

d distribution

Page 16: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

xiv NOTATION

L→ convergence in distributiona.s. almost surelyvn = oP (un) vn/un → 0 in probability

aoP (1)= b a equals b up to the stochastic order oP (1)

EstimationI Fisher information matrix(κη − 1)J−1 asymptotic variance of the QMLθ0 true parameter value� parameter setθ element of the parameter setθn, θ

cn, θn,f , ... estimators of θ0

σ 2t = σ 2

t (θ) volatility built with the value θσ 2t = σ 2

t (θ) as σ 2t but with initial values

t = t (θ) −2 log(conditional variance of εt ) t = t (θ) approximation of t , built with initial valuesVaras,Covas asymptotic variance and covariance

Some abbreviationsES expected shortfallFGLS feasible generalized least squaresOLS ordinary least squaresQML quasi-maximum likelihoodRMSE root mean square errorSACR sample autocorrelationSACV sample autocovarianceSPAC sample partial autocorrelationVaR value at risk

Page 17: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

1

Classical Time Series Modelsand Financial Series

The standard time series analysis rests on important concepts such as stationarity, autocorrelation,white noise, innovation, and on a central family of models, the autoregressive moving average(ARMA) models. We start by recalling their main properties and how they can be used. As weshall see, these concepts are insufficient for the analysis of financial time series. In particular, weshall introduce the concept of volatility, which is of crucial importance in finance.

In this chapter, we also present the main stylized facts (unpredictability of returns, volatilityclustering and hence predictability of squared returns, leptokurticity of the marginal distributions,asymmetries, etc.) concerning financial series.

1.1 Stationary Processes

Stationarity plays a central part in time series analysis, because it replaces in a naturalway the hypothesis of independent and identically distributed (iid) observations in standardstatistics.

Consider a sequence of real random variables (Xt)t∈Z, defined on the same probabilityspace. Such a sequence is called a time series, and is an example of a discrete-time stochasticprocess.

We begin by introducing two standard notions of stationarity.

Definition 1.1 (Strict stationarity) The process (Xt) is said to be strictly stationary if the vec-tors (X1, . . . , Xk)

′ and (X1+h, . . . , Xk+h)′ have the same joint distribution, for any k ∈ N andany h ∈ Z.

The following notion may seem less demanding, because it only constrains the first twomoments of the variables Xt , but contrary to strict stationarity, it requires the existence ofsuch moments.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 18: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

2 GARCH MODELS

Definition 1.2 (Second-order stationarity) The process (Xt ) is said to be second-orderstationary if:

(i) EX2t <∞, ∀t ∈ Z;

(ii) EXt = m, ∀t ∈ Z;(iii) Cov(Xt , Xt+h) = γX(h), ∀t, h ∈ Z.

The function γX(·) (ρX(·) := γX(·)/γX(0)) is called the autocovariance function (autocorrelationfunction) of (Xt).

The simplest example of a second-order stationary process is white noise. This process isparticularly important because it allows more complex stationary processes to be constructed.

Definition 1.3 (Weak white noise) The process (εt ) is called weak white noise if, for some pos-itive constant σ 2:

(i) Eεt = 0, ∀t ∈ Z;(ii) Eε2

t = σ 2, ∀t ∈ Z;(iii) Cov(εt , εt+h) = 0, ∀t, h ∈ Z, h = 0.

Remark 1.1 (Strong white noise) It should be noted that no independence assumption is madein the definition of weak white noise. The variables at different dates are only uncorrelated andthe distinction is particularly crucial for financial time series. It is sometimes necessary to replacehypothesis (iii) by the stronger hypothesis

(iii′) the variables εt and εt+h are independent and identically distributed.

The process (εt ) is then said to be strong white noise.

Estimating Autocovariances

The classical time series analysis is centered on the second-order structure of the processes. Gaus-sian stationary processes are completely characterized by their mean and their autocovariancefunction. For non-Gaussian processes, the mean and autocovariance give a first idea of the tem-poral dependence structure. In practice, these moments are unknown and are estimated from arealization of size n of the series, denoted X1, . . . , Xn. This step is preliminary to any constructionof an appropriate model. To estimate γ (h), we generally use the sample autocovariance defined,for 0 ≤ h < n, by

γ (h) = 1

n

n−h∑j=1

(Xj −X)(Xj+h − X) := γ (−h),

where X = (1/n)∑n

j=1 Xj denotes the sample mean. We similarly define the sample autocorrela-tion function by ρ(h) = γ (h)/γ (0) for |h| < n.

The previous estimators have finite-sample bias but are asymptotically unbiased. There areother similar estimators of the autocovariance function with the same asymptotic properties (forinstance, obtained by replacing 1/n by 1/(n− h)). However, the proposed estimator is to bepreferred over others because the matrix (γ (i − j)) is positive semi-definite (see Brockwell andDavis, 1991, p. 221).

It is of course not recommended to use the sample autocovariances when h is close to n, becausetoo few pairs (Xj ,Xj+h) are available. Box, Jenkins and Reinsel (1994, p. 32) suggest that usefulestimates of the autocorrelations can only be made if, approximately, n> 50 and h ≤ n/4.

Page 19: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 3

It is often of interest to know – for instance, in order to select an appropriate model – ifsome or all the sample autocovariances are significantly different from 0. It is then necessary toestimate the covariance structure of those sample autocovariances. We have the following result(see Brockwell and Davis, 1991, pp. 222, 226).

Theorem 1.1 (Bartlett’s formulas for a strong linear process) Let (Xt) be a linear processsatisfying

Xt =∞∑

j=−∞φj εt−j ,

∞∑j=−∞

|φj | <∞,

where (εt ) is a sequence of iid variables such that

E(εt ) = 0, E(ε2t ) = σ 2, E(ε4

t ) = κεσ4 <∞.

Appropriately normalized, the sample autocovariances and autocorrelations are asymptotically nor-mal, with asymptotic variances given by the Bartlett formulas:

limn→∞ nCov{γ (h), γ (k)} =

∞∑i=−∞

γ (i)γ (i + k − h)+ γ (i + k)γ (i − h)

+ (κε − 3)γ (h)γ (k) (1.1)

and

limn→∞ nCov{ρ(h), ρ(k)} =

∞∑i=−∞

ρ(i)[2ρ(h)ρ(k)ρ(i) − 2ρ(h)ρ(i + k)− 2ρ(k)ρ(i + h)

+ ρ(i + k − h)+ ρ(i − k − h)]. (1.2)

Formula (1.2) still holds under the assumptions

Eε2t <∞,

∞∑j=−∞

|j |φ2j <∞.

In particular, if Xt = εt and Eε2t <∞, we have

√n

ρ(1)...

ρ(h)

L→ N (0, Ih) .

The assumptions of this theorem are demanding, because they require a strong white noise (εt ). Anextension allowing the strong linearity assumption to be relaxed is proposed in Appendix B.2. Formany nonlinear processes, in particular the ARCH processes studies in this book, the asymptoticcovariance of the sample autocovariances can be very different from (1.1) (Exercises 1.6 and 1.8).Using the standard Bartlett formula can lead to specification errors (see Chapter 5).

1.2 ARMA and ARIMA Models

The aim of time series analysis is to construct a model for the underlying stochastic process. Thismodel is then used for analyzing the causal structure of the process or to obtain optimal predictions.

Page 20: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

4 GARCH MODELS

The class of ARMA models is the most widely used for the prediction of second-orderstationary processes. These models can be viewed as a natural consequence of a fundamentalresult due to Wold (1938), which can be stated as follows: any centered, second-order stationary,and ‘purely nondeterministic’ process1 admits an infinite moving-average representation of the form

Xt = εt +∞∑i=1

ciεt−i , (1.3)

where (εt ) is the linear innovation process of (Xt), that is

εt = Xt − E(Xt |HX(t − 1)), (1.4)

where HX(t − 1) denotes the Hilbert space generated by the random variables Xt−1, Xt−2, . . . .2

and E(Xt|HX(t – 1)) denotes the orthogonal projection of Xt onto HX(t – 1). The sequence ofcoefficients (ci) is such that

∑i c

2i <∞. Note that (εt ) is a weak white noise.

Truncating the infinite sum in (1.3), we obtain the process

Xt(q) = εt +q∑i=1

ciεt−i ,

called a moving average process of order q, or MA(q). We have

‖Xt(q)−Xt‖22 = Eε2

t

∑i > q

c2i → 0, as q →∞.

It follows that the set of all finite-order moving averages is dense in the set of second-orderstationary and purely nondeterministic processes. The class of ARMA models is often preferredto the MA models for parsimony reasons, because they generally require fewer parameters.

Definition 1.4 (ARMA( p, q) process) A second-order stationary process (Xt) is calledARMA(p, q), where p and q are integers, if there exist real coefficients c, a1, . . . , ap, b1, . . . , bqsuch that,

∀t ∈ Z, Xt +p∑i=1

aiXt−i = c + εt +q∑

j=1

bj εt−j , (1.5)

where (εt ) is the linear innovation process of (Xt ).

This definition entails constraints on the zeros of the autoregressive and moving average poly-nomials, a(z) = 1 +∑p

i=0 aizi and b(z) = 1+∑q

i=0 bizi (Exercise 1.9). The main attraction of

this model, and the representations obtained by successively inverting the polynomials a(·) andb(·), is that it provides a framework for deriving the optimal linear predictions of the process, inmuch simpler way than by only assuming the second-order stationarity.

Many economic series display trends, making the stationarity assumption unrealistic. Suchtrends often vanish when the series is differentiated, once or several times. Let �Xt = Xt −Xt−1 denote the first-difference series, and let �dXt = �(�d−1Xt) (with �0Xt = Xt ) denote thedifferences of order d .

1A stationary process (Xt ) is said to be purely nondeterministic if and only if⋂∞

n=−∞HX(n) = {0}, whereHX(n) denotes, in the Hilbert space of the real, centered, and square integrable variables, the subspace consti-tuted by the limits of the linear combinations of the variables Xn−i , i ≥ 0. Thus, for a purely nondeterministic(or regular) process, the linear past, sufficiently far away in the past, is of no use in predicting future values.See Brockwell and Davis (1991, pp. 187–189) or Azencott and Dacunha-Castelle (1984) for more details.

2 In this representation, the equivalence class E(Xt |HX(t − 1)) is identified with a random variable.

Page 21: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 5

Definition 1.5 (ARIMA( p, d, q) process) Let d be a positive integer. The process (Xt) is said tobe an ARIMA(p, d, q) process if, for k = 0, . . . , d − 1, the processes (�kXt ) are not second-orderstationary, and (�dXt) is an ARMA(p, q) process.

The simplest ARIMA process is the ARIMA(0, 1, 0), also called the random walk, satisfying

Xt = εt + εt−1 + · · · + ε1 +X0, t ≥ 1,

where εt is a weak white noise.For statistical convenience, ARMA (and ARIMA) models are generally used under stronger

assumptions on the noise than that of weak white noise. Strong ARMA refers to the ARMA modelof Definition 1.4 when εt is assumed to be a strong white noise. This additional assumption allowsus to use convenient statistical tools developed in this framework, but considerably reduces thegenerality of the ARMA class. Indeed, assuming a strong ARMA is tantamount to assuming that(i) the optimal predictions of the process are linear ((εt ) being the strong innovation of (Xt )) and(ii) the amplitudes of the prediction intervals depend on the horizon but not on the observations.We shall see in the next section how restrictive this assumption can be, in particular for financialtime series modeling.

The orders (p, q) of an ARMA process are fully characterized through its autocorrelationfunction (see Brockwell and Davis, 1991, pp. 89–90, for a proof).

Theorem 1.2 (Characterization of an ARMA process) Let (Xt ) denote a second-order station-ary process. We have

ρ(h)+p∑i=1

aiρ(h− i) = 0, for all |h|>q,

if and only if (Xt ) is an ARMA(p, q) process.

To close this section, we summarize the method for time series analysis proposed in the famousbook by Box and Jenkins (1970). To simplify presentation, we do not consider seasonal series, forwhich SARIMA models can be considered.

Box–Jenkins Methodology

The aim of this methodology is to find the most appropriate ARIMA(p, d, q) model and to use itfor forecasting. It uses an iterative six-stage scheme:

(i) a priori identification of the differentiation order d (or choice of another transformation);

(ii) a priori identification of the orders p and q;

(iii) estimation of the parameters (a1, . . . , ap, b1, . . . , bq and σ 2 = Var εt );

(iv) validation;

(v) choice of a model;

(vi) prediction.

Although many unit root tests have been introduced in the last 30 years, step (i) is still essentiallybased on examining the graph of the series. If the data exhibit apparent deviations from stationarity,it will not be appropriate to choose d = 0. For instance, if the amplitude of the variations tends

Page 22: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

6 GARCH MODELS

Inde

x va

lue

2000

3000

4000

5000

6000

7000

19/Aug/91 11/Sep/01 21/Jan/08

Figure 1.1 CAC 40 index for the period from March 1, 1990 to October 15, 2008 (4702observations).

to increase, the assumption of constant variance can be questioned. This may be an indicationthat the underlying process is heteroscedastic.3 If a regular linear trend is observed, positive ornegative, it can be assumed that the underlying process is such that EXt = at + b with a = 0. Ifthis assumption is correct, the first-difference series �Xt = Xt −Xt−1 should not show any trend(E�Xt = a) and could be stationary. If no other sign of nonstationarity can be detected (suchas heteroscedasticity), the choice d = 1 seems suitable. The random walk (whose sample pathsmay resemble the graph of Figure 1.1), is another example where d = 1 is required, although thisprocess does not have any deterministic trend.

Step (ii) is more problematic. The primary tool is the sample autocorrelation function. If,for instance, we observe that ρ(1) is far away from 0 but that for any h> 1, ρ(h) is close to0,4 then, from Theorem 1.1, it is plausible that ρ(1) = 0 and ρ(h) = 0 for all h> 1. In thiscase, Theorem 1.2 entails that Xt is an MA(1) process. To identify AR processes, the partialautocorrelation function (see Appendix B.1) plays an analogous role. For mixed models (that is,ARMA(p, q) with pq = 0), more sophisticated statistics can be used, as will be seen in Chapter 5.Step (ii) often results in the selection of several candidates (p1, q1), . . . , (pk, qk) for the ARMAorders. These k models are estimated in step (iii), using, for instance, the least-squares method.The aim of step (iv) is to gauge if the estimated models are reasonably compatible with the data.An important part of the procedure is to examine the residuals which, if the model is satisfactory,should have the appearance of white noise. The correlograms are examined and portmanteau testsare used to decide if the residuals are sufficiently close to white noise. These tools will be describedin detail in Chapter 5. When the tests on the residuals fail to reject the model, the significance ofthe estimated coefficients is studied. Testing the nullity of coefficients sometimes allows the modelto be simplified. This step may lead to rejection of all the estimated models, or to considerationof other models, in which case we are brought back to step (i) or (ii). If several models pass thevalidation step (iv), selection criteria can be used, the most popular being the Akaike (AIC) andBayesian (BIC) information criteria. Complementing these criteria, the predictive properties of themodels can be considered: different models can lead to almost equivalent predictive formulas. Theparsimony principle would thus lead us to choose the simplest model, the one with the fewestparameters. Other considerations can also come into play: for instance, models frequently involvea lagged variable at the order 12 for monthly data, but this would seem less natural for weekly data.

3 In contrast, a process such that VarXt is constant is called (marginally) homoscedastic.4 More precisely, for h> 1,

√n|ρ(h)|/

√1+ 2ρ2(1) is a plausible realization of the |N(0, 1)| distribution.

Page 23: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 7

If the model is appropriate, step (vi) allows us to easily compute the best linear predictions Xt (h)

at horizon h = 1, 2, . . . . Recall that these linear predictions do not necessarily lead to minimalquadratic errors. Nonlinear models, or nonparametric methods, sometimes produce more accuratepredictions. Finally, the interval predictions obtained in step (vi) of the Box–Jenkins methodologyare based on Gaussian assumptions. Their magnitude does not depend on the data, which forfinancial series is not appropriate, as we shall see.

1.3 Financial Series

Modeling financial time series is a complex problem. This complexity is not only due to the varietyof the series in use (stocks, exchange rates, interest rates, etc.), to the importance of the frequencyof d’observation (second, minute, hour, day, etc) or to the availability of very large data sets. It ismainly due to the existence of statistical regularities (stylized facts) which are common to a largenumber of financial series and are difficult to reproduce artificially using stochastic models.

Most of these stylized facts were put forward in a paper by Mandelbrot (1963). Since then,they have been documented, and completed, by many empirical studies. They can be observedmore or less clearly depending on the nature of the series and its frequency. The properties thatwe now present are mainly concerned with daily stock prices.

Let pt denote the price of an asset at time t and let εt = log(pt /pt−1) be the continuouslycompounded or log return (also simply called the return). The series (εt ) is often close to the seriesof relative price variations rt = (pt − pt−1)/pt−1, since εt = log(1+ rt ). In contrast to the prices,the returns or relative prices do not depend on monetary units which facilitates comparisons betweenassets. The following properties have been amply commented upon in the financial literature.

(i) Nonstationarity of price series . Samples paths of prices are generally close to a randomwalk without intercept (see the CAC index series5 displayed in Figure 1.1). On the otherhand, sample paths of returns are generally compatible with the second-order stationarityassumption. For instance, Figures 1.2 and 1.3 show that the returns of the CAC index

Ret

urn

0−5

−10

510

19/Aug/91 11/Sep/01 21/Jan/08

Figure 1.2 CAC 40 returns (March 2, 1990 to October 15, 2008). August 19, 1991, Soviet Putschattempt; September 11, 2001, fall of the Twin Towers; January 21, 2008, effect of the subprimemortgage crisis; October 6, 2008, effect of the financial crisis.

5 The CAC 40 index is a linear combination of a selection of 40 shares on the Paris Stock Exchange (CACstands for ‘Cotations Assistees en Continu’).

Page 24: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

8 GARCH MODELS

Ret

urn

−5−1

00

510

21/Jan/08 06/Oct/08

Figure 1.3 Returns of the CAC 40 (January 2, 2008 to October 15, 2008).

oscillate around zero. The oscillations vary a great deal in magnitude, but are almost constantin average over long subperiods. The recent extreme volatility of prices, induced by thefinancial crisis of 2008, is worth noting.

(ii) Absence of autocorrelation for the price variations . The series of price variations generallydisplays small autocorrelations, making it close to a white noise. This is illustrated for the

0 5 10 15 20 25 30 35

−0.2

0.0

0.2

0.4

Lag

Aut

ocor

rela

tion

(a)

(b)

155 20100 25 30 35

−0.2

0.0

0.2

0.4

Lag

Aut

ocor

rela

tion

Figure 1.4 Sample autocorrelations of (a) returns and (b) squared returns of the CAC 40(January 2, 2008 to October 15, 2008).

Page 25: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 9

CAC in Figure 1.4(a). The classical significance bands are used here, as an approximation,but we shall see in Chapter 5 that they must be corrected when the noise is not independent.Note that for intraday series, with very small time intervals between observations (measuredin minutes or seconds) significant autocorrelations can be observed due to the so-calledmicrostructure effects.

(iii) Autocorrelations of the squared price returns . Squared returns (ε2t ) or absolute returns (|εt |)

are generally strongly autocorrelated (see Figure 1.4(b)). This property is not incompatiblewith the white noise assumption for the returns, but shows that the white noise is not strong.

(iv) Volatility clustering. Large absolute returns |εt | tend to appear in clusters. This property isgenerally visible on the sample paths (as in Figure 1.3). Turbulent (high-volatility) subperiodsare followed by quiet (low-volatility) periods. These subperiods are recurrent but do notappear in a periodic way (which might contradict the stationarity assumption). In otherwords, volatility clustering is not incompatible with a homoscedastic (i.e. with a constantvariance) marginal distribution for the returns.

(v) Fat-tailed distributions . When the empirical distribution of daily returns is drawn, one cangenerally observe that it does not resemble a Gaussian distribution. Classical tests typicallylead to rejection of the normality assumption at any reasonable level. More precisely, thedensities have fat tails (decreasing to zero more slowly than exp(−x2/2)) and are sharplypeaked at zero: they are called leptokurtic. A measure of the leptokurticity is the kurtosiscoefficient, defined as the ratio of the sample fourth-order moment to the squared samplevariance. Asymptotically equal to 3 for Gaussian iid observations, this coefficient is muchgreater than 3 for returns series. When the time interval over which the returns are com-puted increases, leptokurticity tends to vanish and the empirical distributions get closer to aGaussian. Monthly returns, for instance, defined as the sum of daily returns over the month,have a distribution that is much closer to the normal than daily returns. Figure 1.5 compares

−10 −5 50 10

0.0

0.1

0.2

0.3

Den

sity

Figure 1.5 Kernel estimator of the CAC 40 returns density (solid line) and density of a Gaussianwith mean and variance equal to the sample mean and variance of the returns (dotted line).

Page 26: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

10 GARCH MODELS

Table 1.1 Sample autocorrelations of returns εt (CAC 40 index, January 2, 2008 to October 15,2008), of absolute returns |εt |, sample correlations between ε+t−h and |εt |, and between −ε−t−h and|εt |.h 1 2 3 4 5 6 7

ρε(h) −0.012 −0.014 −0.047 0.025 −0.043 −0.023 −0.014ρ|ε|(h) 0.175 0.229 0.235 0.200 0.218 0.212 0.203ρ(ε+t−h, |εt |) 0.038 0.059 0.051 0.055 0.059 0.109 0.061ρ(−ε−t−h, |εt |) 0.160 0.200 0.215 0.173 0.190 0.136 0.173

We use here the notation ε+t = max(εt , 0) and ε−t = min(εt , 0).

a kernel estimator of the density of the CAC returns with a Gaussian density. The peakaround zero appears clearly, but the thickness of the tails is more difficult to visualize.

(vi) Leverage effects . The so-called leverage effect was noted by Black (1976), and involvesan asymmetry of the impact of past positive and negative values on the current volatility.Negative returns (corresponding to price decreases) tend to increase volatility by a largeramount than positive returns (price increases) of the same magnitude. Empirically, a positivecorrelation is often detected between ε+t = max(εt , 0) and |εt+h| (a price increase shouldentail future volatility increases), but, as shown in Table 1.1, this correlation is generallyless than between −ε−t = max(−εt , 0) and |εt+h|.

(vii) Seasonality . Calendar effects are also worth mentioning. The day of the week, the proximityof holidays, among other seasonalities, may have significant effects on returns. Following aperiod of market closure, volatility tends to increase, reflecting the information cumulatedduring this break. However, it can be observed that the increase is less than if the informationhad cumulated at constant speed. Let us also mention that the seasonal effect is also verypresent for intraday series.

1.4 Random Variance Models

The previous properties illustrate the difficulty of financial series modeling. Any satisfactory sta-tistical model for daily returns must be able to capture the main stylized facts described in theprevious section. Of particular importance are the leptokurticity, the unpredictability of returns, andthe existence of positive autocorrelations in the squared and absolute returns. Classical formulations(such as ARMA models) centered on the second-order structure are inappropriate. Indeed, thesecond-order structure of most financial time series is close to that of white noise.

The fact that large absolute returns tend to be followed by large absolute returns (whateverthe sign of the price variations) is hardly compatible with the assumption of constant conditionalvariance. This phenomenon is called conditional heteroscedasticity :

Var(εt | εt−1, εt−2, . . . ) ≡ const.

Conditional heteroscedasticity is perfectly compatible with stationarity (in the strict and second-order senses), just as the existence of a nonconstant conditional mean is compatible with station-arity. The GARCH processes studied in this book will amply illustrate this point.

The models introduced in the econometric literature to account for the very specific natureof financial series (price variations or log-returns, interest rates, etc.) are generally written in themultiplicative form

εt = σtηt (1.6)

Page 27: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 11

where (ηt ) and (σt ) are real processes such that:

(i) σt is measurable with respect to a σ -field, denoted Ft−1;

(ii) (ηt ) is an iid centered process with unit variance, ηt being independent of Ft−1 and σ(εu;u < t);

(iii) σt > 0.

This formulation implies that the sign of the current price variation (that is, the sign of εt ) is thatof ηt , and is independent of past price variations. Moreover, if the first two conditional momentsof εt exist, they are given by

E(εt | Ft−1) = 0, E(ε2t | Ft−1) = σ 2

t .

The random variable σt is called the volatility6 of εt .It may also be noted that (under existence assumptions)

E(εt ) = E(σt )E(ηt ) = 0

andCov(εt , εt−h) = E(ηt )E(σtεt−h) = 0, ∀h> 0,

which makes (εt ) a weak white noise. The series of squares, on the other hand, generally havenonzero autocovariances: (εt ) is thus not a strong white noise.

The kurtosis coefficient of εt , if it exists, is related to that of ηt , denoted κη, by

E(ε4t )

{E(ε2t )}2

= κη

[1+ Var(σ 2

t )

{E(σ 2t )}2

]. (1.7)

This formula shows that the leptokurticity of financial time series can be taken into account in twodifferent ways: either by using a leptokurtic distribution for the iid sequence (ηt ), or by specifyinga process (σ 2

t ) with a great variability.Different classes of models can be distinguished depending on the specification adopted for σt :

(i) Conditionally heteroscedastic (or GARCH-type) processes for which Ft−1 = σ(εs; s < t) isthe σ -field generated by the past of εt . The volatility is here a deterministic function of thepast of εt . Processes of this class differ by the choice of a specification for this function.The standard GARCH models are characterized by a volatility specified as a linear functionof the past values of ε2

t . They will be studied in detail in Chapter 2.

(ii) Stochastic volatility processes7 for which Ft−1 is the σ -field generated by {vt , vt−1, . . .},where (vt ) is a strong white noise and is independent of (ηt ). In these models, volatility is alatent process. The most popular model in this class assumes that the process logσt followsan AR(1) of the form

log σt = ω + φ log σt−1 + vt ,

where the noises (vt ) and (ηt ) are independent.

(iii) Switching-regime models for which σt = σ(�t ,Ft−1), where (�t ) is a latent (unobservable)integer-valued process, independent of (ηt ). The state of the variable �t is here interpretedas a regime and, conditionally on this state, the volatility of εt has a GARCH specification.The process (�t ) is generally supposed to be a finite-state Markov chain. The models arethus called Markov-switching models.

6 There is no general agreement concerning the definition of this concept in the literature. Volatility some-times refers to a conditional standard deviation, and sometimes to a conditional variance.

7 Note, however, that the volatility is also a random variable in GARCH-type processes.

Page 28: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

12 GARCH MODELS

1.5 Bibliographical Notes

The time series concepts presented in this chapter are the subject of numerous books. Two classicalreferences are Brockwell and Davis (1991) and Gourieroux and Monfort (1995, 1996).

The assumption of iid Gaussian price variations has long been predominant in the financeliterature and goes back to the dissertation by Bachelier (1900), where a precursor of Brownianmotion can be found. This thesis, ignored for a long time until its rediscovery by Kolmogorov in1931 (see Kahane, 1998), constitutes the historical source of the link between Brownian motionand mathematical finance. Nonetheless, it relies on only a rough description of the behavior offinancial series. The stylized facts concerning these series can be attributed to Mandelbrot (1963)and Fama (1965). Based on the analysis of many stock returns series, their studies showed theleptokurticity, hence the non-Gaussianity, of marginal distributions, some temporal dependenciesand nonconstant volatilities. Since then, many empirical studies have confirmed these findings.See, for instance, Taylor (2007) for a recent presentation of the stylized facts of financial timesseries. In particular, the calendar effects are discussed in detail.

As noted by Shephard (2005), a precursor article on ARCH models is that of Rosenberg (1972).This article shows that the decomposition (1.6) allows the leptokurticity of financial series to bereproduced. It also proposes some volatility specifications which anticipate both the GARCH andstochastic volatility models. However, the GARCH models to be studied in the next chapters arenot discussed in this article. The decomposition of the kurtosis coefficient in (1.7) can be found inClark (1973).

A number of surveys have been devoted to GARCH models. See, among others, Boller-slev, Chou and Kroner (1992), Bollerslev, Engle and Nelson (1994), Pagan (1996), Palm (1996),Shephard (1996), Kim, Shephard, and Chib (1998), Engle (2001, 2002b, 2004), Engle and Pat-ton (2001), Diebold (2004), Bauwens, Laurent and Rombouts (2006) and Giraitis et al. (2006).Moreover, the books by Gourieroux (1997) and Xekalaki and Degiannakis (2009) are devoted toGARCH and several books devote a chapter to GARCH: Mills (1993), Hamilton (1994), Fransesand van Dijk (2000), Gourieroux and Jasiak (2001), Tsay (2002), Franke, Hardle and Hafner(2004), McNeil, Frey and Embrechts (2005), Taylor (2007) and Andersen et al. (2009). See alsoMikosch (2001).

Although the focus of this book is on financial applications, it is worth mentioning that GARCHmodels have been used in other areas. Time series exhibiting GARCH-type behavior have alsoappeared, for example, in speech signals (Cohen, 2004; Cohen, 2006; Abramson and Cohen,2008), daily and monthly temperature measurements (Tol, 1996; Campbell and Diebold, 2005;Romilly, 2005; Huang, Shiu, and Lin, 2008), wind speeds (Ewing, Kruse, and Schroeder, 2006),and atmospheric CO2 concentrations (Hoti, McAleer, and Chan, 2005; McAleer and Chan, 2006).

Most econometric software (for instance, GAUSS, R, RATS, SAS and SPSS) incorporatesroutines that permit the estimation of GARCH models. Readers interested in the implementationwith Ox may refer to Laurent (2009).

Stochastic volatility models are not treated in this book. One may refer to the book by Taylor(2007), and to the references therein. For switching regimes models, two recent references are themonographs by Cappe, Moulines and Ryden (2005), and by Fruhwirth-Schnatter (2006).

1.6 Exercises

1.1 (Stationarity, ARMA models, white noises)Let (ηt ) denote an iid centered sequence with unit variance (and if necessary with a finitefourth-order moment).

1. Do the following models admit a stationary solution? If yes, derive the expectation andthe autocorrelation function of this solution.

Page 29: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 13

(a) Xt = 1+ 0.5Xt−1 + ηt ;

(b) Xt = 1+ 2Xt−1 + ηt ;

(c) Xt = 1+ 0.5Xt−1 + ηt − 0.4ηt−1.

2. Identify the ARMA models compatible with the following recursive relations, where ρ(·)denotes the autocorrelation function of some stationary process:

(a) ρ(h) = 0.4ρ(h− 1), for all h> 2;

(b) ρ(h) = 0, for all h> 3;

(c) ρ(h) = 0.2ρ(h− 2), for all h> 1.

3. Verify that the following processes are white noises and decide if they are weak or strong.

(a) εt = η2t − 1;

(b) εt = ηtηt−1;

1.2 (A property of the sum of the sample autocorrelations)Let

γ (h) = γ (−h) = 1

n

n−h∑t=1

(Xt −Xn)(Xt+h − Xn), h = 0, . . . , n− 1,

denote the sample autocovariances of real observations X1, . . . , Xn. Set ρ(h) = ρ(−h) =γ (h)/γ (0) for h = 0, . . . , n− 1. Show that

n−1∑h=1

ρ(h) = −1

2.

1.3 (It is impossible to decide whether a process is stationary from a path)Show that the sequence

{(−1)t

}t=0,1,... can be a realization of a nonstationary process. Show

that it can also be a realization of a stationary process. Comment on the consequences of thisresult.

1.4 (Stationarity and ergodicity from a path)Can the sequence 0, 1, 0, 1, . . . be a realization of a stationary process or of a stationary andergodic process? The definition of ergodicity can be found in Appendix A.1.

1.5 (A weak white noise which is not semi-strong)Let (ηt ) denote an iid N(0, 1) sequence and let k be a positive integer. Set εt = ηtηt−1 . . . ηt−k .Show that (εt ) is a weak white noise, but is not a strong white noise.

1.6 (Asymptotic variance of sample autocorrelations of a weak white noise)Consider the white noise εt of Exercise 1.5. Compute limn→∞ nVar ρ(h) where h = 0 andρ(·) denotes the sample autocorrelation function of ε1, . . . , εn. Compare this asymptoticvariance with that obtained from the usual Bartlett formula.

1.7 (ARMA representation of the square of a weak white noise)Consider the white noise εt of Exercise 1.5. Show that ε2

t follows an ARMA process. Makethe ARMA representation explicit when k = 1.

1.8 (Asymptotic variance of sample autocorrelations of a weak white noise)Repeat Exercise 1.6 for the weak white noise εt = ηt/ηt−k , where (ηt ) is an iid sequencesuch that Eη4

t <∞ and Eη−2t <∞, and k is a positive integer.

Page 30: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

14 GARCH MODELS

5 10 15 20 25 30 35

−0.05

−0.025

0.025

0.05

0.075

0.1

5 10 15 20 25 30 35

−0.04

−0.02

0.02

0.04

0.06

hh

0.08

0.1

Figure 1.6 Sample autocorrelations ρ(h) (h = 1, . . . , 36) of (a) the S&P 500 index fromJanuary 3, 1979 to December 30, 2001, and (b) the squared index. The interval between thedashed lines (±1.96/

√n, where n = 5804 is the sample length) should contain approximately

95% of a strong white noise.

1.9 (Stationary solutions of an AR(1))Let (ηt )t∈Z be an iid centered sequence with variance σ 2 > 0, and let a = 0. Consider theAR(1) equation

Xt − aXt−1 = ηt , t ∈ Z. (1.8)

1. Show that for |a| < 1, the infinite sum

Xt =∞∑k=0

akηt−k

converges in quadratic mean and almost surely, and that it is the unique stationary solutionof (1.8).

2. For |a| = 1, show that no stationary solution exists.

3. For |a|> 1, show that

Xt = −∞∑k=1

1

akηt+k

is the unique stationary solution of (1.8).

4. For |a|> 1, show that the causal representation

Xt − 1

aXt−1 = εt , t ∈ Z, (1.9)

holds, where (εt )t∈Z is a white noise.

1.10 (Is the S&P 500 a white noise?)Figure 1.6 displays the correlogram of the S&P 500 returns from January 3, 1979 toDecember 30, 2001, as well as the correlogram of the squared returns. Is it reasonable tothink that this index is a strong white noise or a weak white noise?

1.11 (Asymptotic covariance of sample autocovariances)Justify the equivalence between (B.18) and (B.14) in the proof of the generalized Bartlettformula of Appendix B.2.

Page 31: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 15

1.12 (Asymptotic independence between the ρ(h) for a noise)Simplify the generalized Bartlett formulas (B.14) and (B.15) when X = ε is a pure whitenoise.In an autocorrelogram, consider the random number M of sample autocorrelations fallingoutside the significance region (at the level 95%, say), among the first m autocorrelations.How can the previous result be used to evaluate the variance of this number when the observedprocess is a white noise (satisfying the assumptions allowing (B.15) to be used)?

1.13 (An incorrect interpretation of autocorrelograms)Some practitioners tend to be satisfied with an estimated model only if all sample autocorre-lations fall within the 95% significance bands. Show, using Exercise 1.12, that based on 20autocorrelations, say, this approach leads to wrongly rejecting a white noise with a very highprobability.

1.14 (Computation of partial autocorrelations)Use the algorithm in (B.7) – (B.9) to compute rX(1), rX(2) and rX(3) as a function of ρX(1),ρX(2) and ρX(3).

1.15 (Empirical application)Download from http://fr.biz.yahoo.com//bourse/accueil.html for instance, astock index such as the CAC 40. Draw the series of closing prices, the series of returns, theautocorrelation function of the returns, and that of the squared returns. Comment on thesegraphs.

Page 32: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 33: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Part I

Univariate GARCH Models

Page 34: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 35: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

2

GARCH( p, q) Processes

Autoregressive conditionally heteroscedastic (ARCH) models were introduced by Engle (1982)and their GARCH (generalized ARCH) extension is due to Bollerslev (1986). In these models,the key concept is the conditional variance, that is, the variance conditional on the past. In theclassical GARCH models, the conditional variance is expressed as a linear function of the squaredpast values of the series. This particular specification is able to capture the main stylized factscharacterizing financial series, as described in Chapter 1. At the same time, it is simple enough toallow for a complete study of the solutions. The ‘linear’ structure of these models can be displayedthrough several representations that will be studied in this chapter.

We first present definitions and representations of GARCH models. Then we establish thestrict and second-order stationarity conditions. Starting with the first-order GARCH model, forwhich the proofs are easier and the results are more explicit, we extend the study to the generalcase. We also study the so-called ARCH(∞) models, which allow for a slower decay of squared-return autocorrelations. Then, we consider the existence of moments and the properties of theautocorrelation structure. We conclude this chapter by examining forecasting issues.

2.1 Definitions and Representations

We start with a definition of GARCH processes based on the first two conditional moments.

Definition 2.1 (GARCH( p, q) process) A process (εt ) is called a GARCH(p, q) process if itsfirst two conditional moments exist and satisfy:

(i) E(εt | εu, u < t) = 0, t ∈ Z.

(ii) There exist constants ω, αi , i = 1, . . . , q and βj , j = 1, . . . , p such that

σ 2t = Var(εt | εu, u < t) = ω +

q∑i=1

αiε2t−i +

p∑j=1

βjσ2t−j , t ∈ Z. (2.1)

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 36: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

20 GARCH MODELS

Equation (2.1) can be written in a more compact way as

σ 2t = ω + α(B)ε2

t + β(B)σ 2t , t ∈ Z, (2.2)

where B is the standard backshift operator (Biε2t = ε2

t−i and Biσ 2t = σ 2

t−i for any integer i), andα and β are polynomials of degrees q and p, respectively:

α(B) =q∑i=1

αiBi, β(B) =

p∑j=1

βjBj .

If β(z) = 0 we have

σ 2t = ω +

q∑i=1

αiε2t−i (2.3)

and the process is called an ARCH(q) process.1 By definition, the innovation of the process ε2t

is the variable νt = ε2t − σ 2

t . Substituting in (2.1) the variables σ 2t−j by ε2

t−j − νt−j , we get therepresentation

ε2t = ω +

r∑i=1

(αi + βi)ε2t−i + νt −

p∑j=1

βjνt−j , t ∈ Z, (2.4)

where r = max(p, q), with the convention αi = 0 (βj = 0) if i > q (j >p). This equation has thelinear structure of an ARMA model, allowing for simple computation of the linear predictions.Under additional assumptions (implying the second-order stationarity of ε2

t ), we can state thatif (εt ) is GARCH(p, q), then (ε2

t ) is an ARMA(r, p) process. In particular, the square of anARCH(q) process admits, if it is stationary, an AR(q) representation. The ARMA representationwill be useful for the estimation and identification of GARCH processes.2

Remark 2.1 (Correlation of the squares of a GARCH) We observed in Chapter 2 that a char-acteristic feature of financial series is that squared returns are autocorrelated, while returns arenot. The representation (2.4) shows that GARCH processes are able to capture this empirical fact.If the fourth-order moment of (εt ) is finite, the sequence of the h-order autocorrelations of ε2

t

is the solution of a recursive equation which is characteristic of ARMA models. For the sake ofsimplicity, consider the GARCH(1, 1) case. The squared process (ε2

t ) is ARMA(1, 1), and thus itsautocorrelation decreases to zero proportionally to (α1 + β1)

h: for h> 1,

Corr(ε2t , ε

2t−h) = K(α1 + β1)

h,

where K is a constant independent of h. Moreover, the εt ’s are uncorrelated in view of (i) inDefinition 2.1.

Definition 2.1 does not directly provide a solution process satisfying those conditions. Thenext definition is more restrictive but allows explicit solutions to be obtained. The link betweenthe two definitions will be given in Remark 2.5. Let η denote a probability distribution with nullexpectation and unit variance.

1 This specification quickly turned out to be too restrictive when applied to financial series. Indeed, a largenumber of past variables have to be included in the conditional variance to obtain a good model fit. Choosinga large value for q is not satisfactory from a statistical point of view because it requires a large number ofcoefficients to be estimated.

2 It cannot be used to study the existence of stationary solutions, however, because the process (νt ) is notan iid process.

Page 37: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 21

Definition 2.2 (Strong GARCH(p, q) process) Let (ηt ) be an iid sequence with distribution η.The process (εt ) is called a strong GARCH(p, q) (with respect to the sequence (ηt )) if{

εt = σtηtσ 2t = ω +∑q

i=1 αiε2t−i +

∑p

j=1 βjσ2t−j

(2.5)

where the αi and βj are nonnegative constants and ω is a (strictly) positive constant.

GARCH processes in the sense of Definition 2.1 are sometimes called semi-strong following thepaper by Drost and Nijman (1993) on temporal aggregation. Substituting εt−i by σt−iηt−i in (2.1),we get

σ 2t = ω +

q∑i=1

αiσ2t−iη

2t−i +

p∑j=1

βjσ2t−j ,

which can be written as

σ 2t = ω +

r∑i=1

ai(ηt−i )σ 2t−i , (2.6)

where ai(z) = αiz2 + βi, i = 1, . . . , r. This representation shows that the volatility process of a

strong GARCH is the solution of an autoregressive equation with random coefficients.

Properties of Simulated Paths

Contrary to standard time series models (ARMA), the GARCH structure allows the magnitude ofthe noise εt to be a function of its past values. Thus, periods with high volatility level (correspondingto large values of ε2

t−i) will be followed by periods where the fluctuations have a smaller amplitude.Figures 2.1–2.7 illustrate the volatility clustering for simulated GARCH models. Large absolutevalues are not uniformly distributed on the whole period, but tend to cluster. We will see that allthese trajectories correspond to strictly stationary processes which, except for the ARCH(1) modelsof Figures 2.3–2.5, are also second-order stationary. Even if the absolute values can be extremelylarge, these processes are not explosive, as can be seen from these figures. Higher values of α(theoretically α > 3.56 for the N(0, 1) distribution, as will be established below) lead to explosivepaths. Figures 2.6 and 2.7, corresponding to GARCH(1, 1) models, have been obtained with thesame simulated sequence (ηt ). As we will see, permuting α and β does not modify the varianceof the process but has an effect on the higher-order moments. For instance the simulated process

100 200 300 400 500

−4

−2

2

4

Figure 2.1 Simulation of size 500 of the ARCH(1) process with ω = 1, α = 0.5 and ηt ∼N(0, 1).

Page 38: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

22 GARCH MODELS

100 200 300 400 500

−10

−5

5

10

Figure 2.2 Simulation of size 500 of the ARCH(1) process with ω = 1, α = 0.95 and ηt ∼N(0, 1).

100 200 300 400 500

−20

−15

−10

−5

5

10

15

20

Figure 2.3 Simulation of size 500 of the ARCH(1) process with ω = 1, α = 1.1 and ηt ∼N(0, 1).

50 100 150 200

−20000

−15000

−10000

−5000

5000

10000

15000

20000

Figure 2.4 Simulation of size 200 of the ARCH(1) process with ω = 1, α = 3 and ηt ∼ N(0, 1).

Page 39: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 23

10 20 30 40

−500

−400

−300

−200

−100

100

200

Figure 2.5 Observations 100–140 of Figure 2.4.

100 200 300 400 500

−15

−10

−5

5

10

15

Figure 2.6 Simulation of size 500 of the GARCH(1, 1) process with ω = 1, α = 0.2, β = 0.7and ηt ∼ N(0, 1).

100 200 300 400 500

−15

−10

−5

5

10

15

Figure 2.7 Simulation of size 500 of the GARCH(1, 1) process with ω = 1, α = 0.7, β = 0.2and ηt ∼ N(0, 1).

Page 40: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

24 GARCH MODELS

of Figure 2.7, with α = 0.7 and β = 0.2, does not admit a fourth-order moment, in contrast to theprocess of Figure 2.6. This is reflected by the presence of larger absolute values in Figure 2.7.The two processes are also different in terms of persistence of shocks : when β approaches 1, ashock on the volatility has a persistent effect. On the other hand, when α is large, sudden volatilityvariations can be observed in response to shocks.

2.2 Stationarity Study

This section is concerned with the existence of stationary solutions (in the strict and second-ordersenses) to model (2.5). We are mainly interested in nonanticipative solutions, that is, processes(εt ) such that εt is a measurable function of the variables ηt−s , s ≥ 0. For such processes, σtis independent of the σ -field generated by {ηt+h, h ≥ 0} and εt is independent of the σ -fieldgenerated by {ηt+h, h> 0}. It will be seen that such solutions are also ergodic. The concept ofergodicity is discussed in Appendix A.1. We first consider the GARCH(1, 1) model, which can bestudied in a more explicit way than the general case. For x > 0, let log+ x = max(log x, 0).

2.2.1 The GARCH(1, 1) CaseWhen p = q = 1, model (2.5) has the form{

εt = σtηt , (ηt ) iid (0, 1),

σ 2t = ω + αε2

t−1 + βσ 2t−1,

(2.7)

with ω ≥ 0, α ≥ 0, β ≥ 0. Let a(z) = αz2 + β.

Theorem 2.1 (Strict stationarity of the strong GARCH(1, 1) process) If

−∞ ≤ γ := E log{αη2t + β} < 0, (2.8)

then the infinite sum

ht ={

1+∞∑i=1

a(ηt−1) . . . a(ηt−i )

}ω (2.9)

converges almost surely (a.s.) and the process (εt ) defined by εt =√htηt is the unique strictly

stationary solution of model (2.7). This solution is nonanticipative and ergodic. If γ ≥ 0 and ω> 0,there exists no strictly stationary solution.

Remark 2.2 (On the strict stationarity condition (2.8))

1. When ω = 0 and γ < 0, it is clear that, in view of (2.9), the unique strictly stationarysolution is εt = 0. It is therefore natural to impose ω> 0.

2. It may be noted that the condition (2.8) depends on the distribution of ηt and that it is notsymmetric in α and β.

3. Examination of the proof below shows that the assumptions Eηt = 0 and Eη2t = 1,

which facilitate the interpretation of the model, are not necessary. It is sufficient to haveE log+ η2

t <∞.

Page 41: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 25

4. Condition (2.8) implies β < 1. Now, if

α + β < 1,

then (2.8) is satisfied since, by application of the Jensen inequality,

E log{a(ηt )} ≤ logE{a(ηt)} = log(α + β) < 0.

5. If (2.8) is satisfied, it is also satisfied for any pair (α1, β1) such that α1 ≤ α and β1 ≤ β. Inparticular, the strict stationarity of a given GARCH(1, 1) model implies that the ARCH(1)model obtained by canceling β is also stationary.

6. In the ARCH(1) case (β = 0), the strict stationarity constraint is written as

0 ≤ α < exp{−E(log η2t )}. (2.10)

For instance when ηt ∼ N(0, 1) the condition becomes α < 3.56. For a distribution suchthat E(log η2

t ) = −∞, for instance with a mass at 0, condition (2.10) is always satisfied. Forsuch distributions, a strictly stationary ARCH(1) solution exists whatever the value of α.

Proof of Theorem 2.1. Note that the coefficient γ = E log{a(ηt )} always exists in [−∞,+∞)

because E log+{a(ηt )} ≤ Ea(ηt ) = α + β. Using iteratively the second equation in model (2.7),we get, for N ≥ 1,

σ 2t = ω + a(ηt−1)σ

2t−1

= ω

[1+

N∑n=1

a(ηt−1) . . . a(ηt−n)

]+ a(ηt−1) . . . a(ηt−N−1)σ

2t−N−1

:= ht (N)+ a(ηt−1) . . . a(ηt−N−1)σ2t−N−1. (2.11)

The limit process ht = limN→∞ ht (N) exists in R+ = [0,+∞] since the summands are nonneg-

ative. Moreover, letting N go to infinity in ht (N) = ω + a(ηt−1)ht−1(N − 1), we get

ht = ω + a(ηt−1)ht−1.

We now show that ht is almost surely finite if and only if γ < 0.Suppose that γ < 0. We will use the Cauchy rule for series with nonnegative terms.3 We have

[a(ηt−1) . . . a(ηt−n)]1/n = exp

[1

n

n∑i=1

log{a(ηt−i )}]→ eγ a.s. (2.12)

as n→∞, by application of the strong law of large numbers to the iid sequence (log{a(ηt )}).4The series defined in (2.9) thus converges almost surely in R, by application of the Cauchy rule,

3 Let∑

an be a series with nonnegative terms and let λ = lim a1/nn . Then (i) if λ < 1 the series

∑an

converges, (ii) if λ> 1 the series∑

an diverges.4 If (Xi) is an iid sequence of random variables admitting an expectation, which can be infinite, then

1n

∑ni=1 Xi → EX1 a.s. This result, which can be found in Billingsley (1995), follows from the strong law of

large numbers for integrable variables: suppose, for instance, that E(X+i ) = +∞ and let, for any integer m> 0,Xi = X+i if 0 ≤ X+i ≤ m, and Xi = 0 otherwise. Then 1

n

∑ni=1 X

+i ≥ 1

n

∑ni=1 Xi → EX1, a.s., by application

of the strong law of large numbers to the sequence of integrable variables Xi . When m goes to infinity, theincreasing sequence EX1 converges to +∞, which allows us to conclude that 1

n

∑ni=1 X

+i →∞ a.s.

Page 42: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

26 GARCH MODELS

and the limit process (ht ) takes positive real values. It follows that the process (εt ) defined by

εt =√htηt =

{ω +

∞∑i=1

a(ηt−1) . . . a(ηt−i )ω

}1/2

ηt (2.13)

is strictly stationary and ergodic (see Appendix A.1, Theorem A.1). Moreover, (εt ) is a nonantic-ipative solution of model (2.7).

We now prove the uniqueness. Let εt = σtηt denote another strictly stationary solution. By(2.11) we have

σ 2t = ht (N)+ a(ηt−1) . . . a(ηt−N−1)σ

2t−N−1.

It follows that

σ 2t − ht = {ht (N)− ht } + a(ηt−1) . . . a(ηt−N−1)σ

2t−N−1.

The term in brackets on the right-hand side tends to 0 a.s. as N →∞. Moreover, sincethe series defining ht converges a.s., we have a(ηt−1) . . . a(ηt−n)→ 0 with probability 1 asn→∞. In addition, the distribution of σ 2

t−N−1 is independent of N by stationarity. Therefore,a(ηt−1) . . . a(ηt−N−1)σ

2t−N−1 → 0 in probability as N →∞. We have proved that σ 2

t − ht → 0in probability as N →∞. This term being independent of N , we necessarily have ht = σ 2

t forany t , a.s.

If γ > 0, from (2.12) and the Cauchy rule,∑N

n=1 a(ηt−1) . . . a(ηt−n)→+∞, a.s., as N →∞.Hence, if ω> 0, ht = +∞ a.s. By (2.11), it is clear that σ 2

t = +∞, a.s. It follows that there existsno almost surely finite solution to (2.7).

For γ = 0, we give a proof by contradiction. Suppose there exists a strictly stationary solution(εt , σ

2t ) of (2.7). We have, for n> 0,

σ 20 ≥ ω

{1+

n∑i=1

a(η−1) . . . a(η−i )

}

from which we deduce that a(η−1) . . . a(η−n)ω converges to zero, a.s., as n→∞, or, equivalently,that

n∑i=1

log a(ηi)+ logω→−∞ a.s. as n→∞. (2.14)

By the Chung–Fuchs theorem5 we have lim sup∑n

i=1 log a(ηi) = +∞ with probability 1, whichcontradicts (2.14). �

The next result shows that nonstationary GARCH processes are explosive.

Corollary 2.1 (Conditions of explosion) For the GARCH(1, 1) model defined by (2.7) for t ≥ 1,with initial conditions for ε0 and σ0,

γ > 0 �⇒ σ 2t →+∞, a.s. (t →∞).

If, in addition, E| log(η2t )| <∞, then

γ > 0 �⇒ ε2t →+∞, a.s. (t →∞).

5 If X1, . . . , Xn is an iid sequence such that EX1 = 0 and E|X1|> 0 then lim supn→∞∑n

i=1 Xi = +∞and lim infn→∞

∑ni=1 Xi = −∞ (see, for instance, Chow and Teicher, 1997).

Page 43: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 27

Proof. We have

σ 2t ≥ ω

{1+

t−1∑i=1

a(ηt−1) . . . a(ηt−i )

}≥ ωa(ηt−1) . . . a(η1). (2.15)

Hence,

lim inft→∞

1

tlog σ 2

t ≥ lim inft→∞

1

t

t−1∑i=1

log a(ηt−i ) = γ.

Thus log σ 2t →∞ and σ 2

t →∞ a.s. as γ > 0. By the same arguments,

lim inft→∞

1

tlog ε2

t = lim inft→∞

1

t(log σ 2

t + log η2t ) ≥ γ + lim inf

t→∞1

tlog η2

t = γ

using Exercise 2.11. The conclusion follows. �

Remark 2.3 (On Corollary 2.1)

1. When γ = 0, Kluppelberg, Lindner and Maller (2004) showed that σ 2t →∞ in probability.

2. Since, by Jensen’s inequality, we have E log(η2t ) <∞, the restriction E| log(η2

t )| <∞means E log(η2

t ) >−∞. In the ARCH(1) case this restriction vanishes because the conditionγ = E logαη2

t > 0 implies E log(η2t ) >−∞.

Theorem 2.2 (Second-order stationarity of the GARCH(1, 1) process) Let ω> 0. If α + β ≥1, a nonanticipative and second-order stationary solution to the GARCH(1, 1) model does not exist.If α + β < 1, the process (εt ) defined by (2.13) is second-order stationary. More precisely, (εt ) isa weak white noise. Moreover, there exists no other second-order stationary and nonanticipativesolution.

Proof. If (εt ) is a GARCH(1, 1) process, in the sense of Definition 2.1, which is second-orderstationary and nonanticipative, we have

E(ε2t ) = E

{E(ε2

t | εu, u < t)} = E(σ 2

t ) = ω + (α + β)E(ε2t−1),

that is,(1− α − β)E(ε2

t ) = ω.

Hence, we must have α + β < 1. In addition, we get E(ε2t ) > 0. Conversely, suppose α + β < 1.

By Remark 2.2(4), the strict stationarity condition is satisfied. It is thus sufficient to show thatthe strictly stationary solution defined in (2.13) admits a finite variance. The variable ht being anincreasing limit of positive random variables, the infinite sum and the expectation can be permutedto give

E(ε2t ) = E(ht ) =

[1+

+∞∑n=1

E{a(ηt−1) . . . a(ηt−n)}]ω

=[

1++∞∑n=1

{Ea(ηt )}n]ω

=[

1++∞∑n=1

(α + β)n

]ω = ω

1− (α + β).

Page 44: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

28 GARCH MODELS

This proves the second-order stationarity of the solution. Moreover, this solution is a white noisebecause E(εt ) = E {E(εt | εu, u < t)} = 0 and for all h> 0,

Cov(εt , εt−h) = E {εt−hE(εt | εu, u < t)} = 0.

Let εt =√ht ηt denote another second-order and nonanticipative stationary solution. We have

|ht − ht | = a(ηt−1) . . . a(ηt−n)|ht−n−1 − ht−n−1|,and then

E|ht − ht | = E{a(ηt−1) . . . a(ηt−n)}E|ht−n−1 − ht−n−1|= (α + β)nE|ht−n−1 − ht−n−1|.

Notice that the second equality uses the fact that the solutions are nonanticipative. This assumptionwas not necessary to establish the uniqueness of the strictly stationary solution. The expectation of|ht−n−1 − ht−n−1| being bounded by E|ht−n−1| + E|ht−n−1|, which is finite and independent of nby stationarity, and since (α + β)n tends to 0 when n→∞, we obtain E|ht − ht | = 0 and thusht = ht for all t , a.s. �

Figure 2.8 shows the zones of strict and second-order stationarity for the strong GARCH(1, 1)model when ηt ∼ N(0, 1). Note that the distribution of ηt only matters for the strict stationarity.As noted above, the frontier of the strict stationarity zone corresponds to a random walk (forthe process log(ht − ω)). A similar interpretation holds for the second-order stationarity zone. Ifα + β = 1 we have

ht = ω + ht−1 + αht−1(η2t−1 − 1).

Thus, since the last term in this equality is centered and uncorrelated with any variable belongingto the past of ht−1, the process (ht ) is a random walk. The corresponding GARCH process iscalled integrated GARCH (or IGARCH(1, 1)) and will be studied later: it is strictly stationary, hasan infinite variance, and a conditional variance which is a random walk (with a positive drift).

2.2.2 The General CaseIn the general case of a strong GARCH(p, q) process, the following vector representation will beuseful. We have

zt= bt + Atzt−1, (2.16)

0 1 2 3 4

1

1

b1

a1

3

2

Figure 2.8 Stationarity regions for the GARCH(1, 1) model when ηt ∼ N(0, 1): 1, second-orderstationarity; 1 and 2, strict stationarity; 3, nonstationarity.

Page 45: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 29

where

bt = b(ηt ) =

ωη2t

0...

ω

0...

0

∈ Rp+q, z

t=

ε2t

...

ε2t−q+1

σ 2t

...

σ 2t−p+1

∈ Rp+q ,

and

At =

α1η2t · · · αqη

2t β1η

2t · · · βpη

2t

1 0 · · · 0 0 · · · 00 1 · · · 0 0 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 1 0 0 . . . 0 0

α1 · · · αq β1 · · · βp

0 · · · 0 1 0 · · · 00 · · · 0 0 1 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 0 0 0 . . . 1 0

(2.17)

is a (p + q)× (p + q) matrix. In the ARCH(q) case, zt

reduces to ε2t and its q − 1 first past values,

and At to the upper-left block of the above matrix. Equation (2.16) defines a first-order vectorautoregressive model, with positive and iid matrix coefficients. The distribution of z

tconditional

on its infinite past coincides with its distribution conditional on zt−1 only, which means that (zt)

is a Markov process. Model (2.16) is thus called the Markov representation of the GARCH(p, q)model. Iterating (2.16) yields

zt= bt +

∞∑k=1

AtAt−1 . . . At−k+1bt−k, (2.18)

provided that the series exists almost surely. Finding conditions ensuring the existence of this seriesis the object of what follows. Notice that the existence of the right-hand vector in (2.18) does notensure that its components are positive. One sufficient condition for

bt +∞∑k=1

AtAt−1 . . . At−k+1bt−k > 0, a.s., (2.19)

in the sense that all the components of this vector are strictly positive (but possibly infinite), isthat

ω> 0, αi ≥ 0 (i = 1, . . . , q), βj ≥ 0 (j = 1, . . . , p). (2.20)

This condition is very simple to use but may not be necessary, as we will see in Section 2.3.2.

Strict Stationarity

The main tool for studying strict stationarity is the concept of the top Lyapunov exponent. Let A bea (p + q)× (p + q) matrix. The spectral radius of A, denoted by ρ(A), is defined as the greatest

Page 46: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

30 GARCH MODELS

modulus of its eigenvalues. Let ‖ · ‖ denote any norm on the space of the (p + q)× (p + q)

matrices. We have the following algebra result:

limt→∞

1

tlog ‖At‖ = log ρ(A) (2.21)

(Exercise 2.3). This property has the following extension to random matrices.

Theorem 2.3 Let {At , t ∈ Z} be a strictly stationary and ergodic sequence of random matrices,such that E log+ ‖At‖ is finite. We have

limt→∞

1

tE (log ‖AtAt−1 . . . A1‖) = γ = inf

t∈N∗1

tE(log ‖AtAt−1 . . . A1‖), (2.22)

γ is called the top Lyapunov exponent and exp(γ ) is called the spectral radius of the sequence ofmatrices {At, t ∈ Z}. Moreover,

γ = limt→∞ a.s.

1

tlog ‖AtAt−1 . . . A1‖. (2.23)

Remark 2.4 (On the top Lyapunov exponent γ )

1. It is always true that γ ≤ E(log ‖A1‖), with equality in dimension 1.

2. If At = A for all t ∈ Z, we have γ = logρ(A) in view of (2.21).

3. All norms on a finite-dimensional space being equivalent, it readily follows that γ is inde-pendent of the norm chosen.

4. The equivalence between the definitions of γ can be shown using Kingman’s subadditiveergodic theorem (see Kingman, 1973, Theorem 6). The characterization in (2.23) is particu-larly interesting because its allows us to evaluate this coefficient by simulation. Asymptoticconfidence intervals can also be obtained (see Goldsheid, 1991).

The following general lemma, which we shall state without proof (see Bougerol and Picard, 1992a,Lemma 3.4), is very useful for studying products of random matrices.

Lemma 2.1 Let {At , t ∈ Z} be an ergodic and strictly stationary sequence of random matricessuch that E log+ ‖At‖ is finite, endowed with a top Lyapunov exponent γ . Then

limt→∞ a.s. ‖A0 . . . A−t‖ = 0 ⇒ γ < 0. (2.24)

As for ARMA models, we are mostly interested in the nonanticipative solutions (εt ) to model(2.5), that is, those for which εt belongs to the σ -field generated by {ηt , ηt−1, . . .}.

Theorem 2.4 (Strict stationarity of the GARCH( p, q) model) A necessary and sufficient con-dition for the existence of a strictly stationary solution to the GARCH(p, q) model (2.5) is that

γ < 0,

where γ is the top Lyapunov exponent of the sequence {At, t ∈ Z} defined by (2.17). When thestrictly stationary solution exists, it is unique, nonanticipative and ergodic.

Proof. We shall use the norm defined by ‖A‖ =∑ |aij |. For convenience, the norm will bedenoted identically whatever the dimension of A. With this convention, this norm is clearly

Page 47: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 31

multiplicative: ‖AB‖ ≤ ‖A‖‖B‖ for all matrices A and B such that AB exists.6 Observe thatsince the variable ηt has a finite variance, the components of the matrix At are integrable. Hence,we have

E log+ ‖At‖ ≤ E‖At‖ <∞.

First suppose γ < 0. Then it follows from (2.23) that

zt(N) = bt +

N∑n=0

AtAt−1 . . . At−nbt−n−1

converges almost surely, when N goes to infinity, to some limit zt. Indeed, using the fact that the

norm is multiplicative,

‖zt(N)‖ ≤ ‖bt‖ +

∞∑n=0

‖AtAt−1 . . . At−n‖‖bt−n−1‖ (2.25)

and

‖At . . . At−n‖1/n‖bt−n−1‖1/n = exp

[1

nlog ‖At . . . At−n‖ + 1

nlog ‖bt−n−1‖

]a.s.−→ eγ < 1.

To show that n−1 log ‖bt−n−1‖ → 0 a.s. we have used the result proven in Exercise 2.11, whichcan be applied because

E| log ‖bt−n−1‖| ≤ | logω| + E log+ ‖bt−n−1‖ ≤ | logω| + E‖bt−n−1‖ <∞.

It follows that, by the Cauchy rule, zt

is well defined in (R∗+)p+q . Let zq+1,t denote the (q + 1)th

component of zt. Setting εt =

√zq+1,tηt , we define a solution of model (2.5). This solution is

nonanticipative since, by (2.18), εt can be expressed as a measurable function of ηt , ηt−1, . . . . ByTheorem A.1, together with the ergodicity (ηt ), this solution is also strictly stationary and ergodic.

The proof of the uniqueness parallels the arguments given in the case p = q = 1. Let (εt )denote a strictly stationary solution of model (2.5), or equivalently, let (z

t) denote a positive and

strictly stationary solution of (2.16). For all N ≥ 0,

zt= z

t(N)+ At . . . At−Nzt−N−1.

Then‖z

t− z

t‖ ≤ ‖z

t(N)− z

t‖ + ‖At . . . At−N‖‖zt−N−1‖.

The first term on the right-hand side tends to 0 a.s. as N →∞. In addition, because theseries defining z

tconverges a.s., we have ‖At . . . At−N‖ → 0 with probability 1 when n→∞.

Moreover, the distribution of ‖zt−N−1‖ is independent of N by stationarity. It follows that

‖At . . . At−N‖‖zt−N−1‖ → 0 in probability as N →∞. We have shown that zt− z

t→ 0 in

probability when N →∞. This quantity being independent of N , we necessarily have zt= z

tfor

any t , a.s.

6 Other examples of multiplicative norms are the Euclidean norm, ‖A‖ = {∑ a2ij }1/2 = {Tr(A′A)}1/2, and

the sup norm defined, for any matrix A of size d × d, by N(A) = sup {‖Ax‖; x ∈ Rd , ‖x‖ ≤ 1} where‖x‖ =∑ |xi |. A nonmultiplicative norm is N1 defined by N1(A) = max |aij |.

Page 48: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

32 GARCH MODELS

Next we establish the necessary part. From Lemma 2.1, it suffices to prove (2.24). We shallshow that, for 1 ≤ i ≤ p + q,

limt→∞A0 . . . A−t ei = 0, a.s., (2.26)

where ei is the ith element of the canonical base of Rp+q . Let (εt ) be a strictly stationary solutionof (2.5) and let (z

t) be defined by (2.16). We have, for t > 0,

z0 = b0 + A0z−1

= b0 +t−1∑k=0

A0 . . . A−kb−k−1 + A0 . . . A−t z−t−1

≥t−1∑k=0

A0 . . . A−kb−k−1 (2.27)

because the coefficients of the matrices At , b0 and zt

are nonnegative.7 It follows that the series∑t−1k=0 A0 . . . A−kb−k−1 converges and thus A0 . . . A−kb−k−1 tends almost surely to 0 as k→∞.

But since b−k−1 = ωη2−k−1e1 + ωeq+1, it follows that A0 . . . A−kb−k−1 can be decomposed into

two positive terms, and we have

limk→∞

A0 . . . A−kωη2−k−1e1 = 0, lim

k→∞A0 . . . A−kωeq+1 = 0, a.s. (2.28)

Since ω = 0, (2.26) holds for i = q + 1. Now we use the equality

A−keq+i = βiη2−ke1 + βieq+1 + eq+i+1, i = 1, . . . , p, (2.29)

with ep+q+1 = 0 by convention. For i = 1, this equality gives

0 = limt→∞A0 . . . A−keq+1 ≥ lim

k→∞A0 . . . A−k+1eq+2 ≥ 0,

hence (2.26) is true for i = q + 2, and, by induction, it is also true for i = q + j, j = 1, . . . , pusing (2.29). Moreover, we note that A−keq = αqη

2−ke1 + αqeq+1, which allows us to see that,

from (2.28), (2.26) holds for i = q. For the other values of i the conclusion follows from

A−kei = αiη2−ke1 + αieq+1 + ei+1, i = 1, . . . , q − 1,

and an ascending recursion. The proof of Theorem 2.4 is now complete. �

Remark 2.5 (On Theorem 2.4 and its proof)

1. This theorem shows, in particular, that the stationary solution of the strong GARCH modelis a semi-strong GARCH process, in the sense of Definition 2.1. The converse is not true,however (see the example of Section 4.1.1 below).

2. Bougerol and Picard (1992b) use a more parsimonious vector representation of theGARCH(p, q) model, based on the vector z∗

t= (σ 2

t , . . . , σ2t−p+1, ε

2t−1, . . . , ε

2t−q+1)

′ ∈Rp+q+1 (Exercise 2.6). However, a drawback of this representation is that it is only definedfor p ≥ 1 and q ≥ 2.

7 Here, and in the sequel, we use the notation x ≥ y, meaning that all the components of the vector x aregreater than, or equal to, those of the vector y.

Page 49: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 33

3. An analogous proof uses the following Markov vector representation based on (2.6):

ht = ω + Btht−1, (2.30)

with ω = (ω, 0, . . . , 0)′ ∈ Rr , ht = (σ 2t , . . . , σ

2t−r+1) ∈ Rr and

Bt =(

a1(ηt−1) . . . ar (ηt−r )Ir−1 0

),

where Ir−1 is the identity matrix of size r − 1. Note that, unlike the At , the matrices Bt arenot independent. It is worth noting (Exercise 2.12), however, that

E

n∏t=0

Bt =n∏t=0

EBt . (2.31)

The independence of the matrices At has been explicitly used in the proof of the necessarypart of the previous theorem, because it was required to apply Lemma 2.1. Moreover, theindependence of the At will be crucial to obtain conditions for the moments existence. Weshall see, however, that the representation (2.30) may be more convenient than (2.16) forderiving other properties such as the mixing properties (see Chapter 3).

4. To verify the condition γ < 0, it is sufficient to check that

E(log ‖AtAt−1 . . . A1‖) < 0

for some t > 0.

5. If a GARCH model admits a strictly stationary solution, any other GARCH model obtainedby replacing the αi and βj by smaller coefficients will also admit a strictly stationarysolution. Indeed, the coefficient γ of the latter model will be smaller than that of the initialmodel because, with the norm chosen, 0 ≤ A ≤ B implies ‖A‖ ≤ ‖B‖. In particular, thestrict stationarity of a given GARCH process entails that the ARCH process obtained bycanceling the coefficients βj is also strictly stationary.

6. We emphasize the fact that any strictly stationary solution of a GARCH model is nonantic-ipative. This is an important difference with ARMA models, for which strictly stationarysolutions depending on both past and future values of the noise exist.

The following result provides a simple necessary condition for strict stationarity, in three differentforms.

Corollary 2.2 (Consequences of strict stationarity) Let γ be the top Lyapunov exponent of thesequence {At , t ∈ Z} defined in (2.17). If γ < 0, we have the following equivalent properties:

(a)∑p

j=1 βj < 1;

(b) 1− β1z− · · · − βpzp = 0 ⇒ |z|> 1;

(c) ρ(B) < 1, where B is the submatrix of At defined by

B =

β1 β2 · · · βp1 0 · · · 00 1 · · · 0...

. . .. . .

...

0 · · · 1 0

.

Page 50: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

34 GARCH MODELS

Proof. Since all the coefficients of the matrices At are nonnegative, it is clear that γ is larger thanthe top Lyapunov exponent of the constant sequence obtained by replacing by 0 the coefficientsof the first q rows and of the first q columns of the matrices At . But the matrix obtained in thisway has the same nonzero eigenvalues as B, and thus has the same spectral radius as B. In viewof Remark 2.4(2), it can be seen that

γ ≥ log ρ(B).

It follows that γ < 0 ⇒ (c). It is easy to show (by induction on p and by computing the determinantwith respect to the last column) that, for λ = 0,

det(λIp − B) = λp − λp−1β1 − · · · − λβp−1 − βp = λpB(

1

λ

), (2.32)

where B(z) = 1− β1z− · · · − βpzp. The equivalence between (b) and (c) is then straightforward.

Next we prove that (a)⇔ (b). We have B(0) = 1 and B(1) = 1−∑p

j=1 βj . Hence, if∑p

j=1 βj ≥ 1then B(1) ≤ 0 and, by a continuity argument, there exists a root of B in (0, 1]. Thus (b) ⇒ (a).Conversely, if

∑p

j=1 βj < 1 and B(z0) = 0 for a z0 of modulus less than or equal to 1, then

1 =∑p

j=1 βjzj

0 =∣∣∣∑p

j=1 βj zj

0

∣∣∣ ≤∑p

j=1 βj |z0|j ≤∑p

j=1 βj , which is impossible. It follows that(a) ⇒ (b) and the proof of the corollary is complete. �

We now give two illustrations allowing us to obtain more explicit stationarity conditions thanin the theorem.

Example 2.1 (GARCH(1, 1)) In the GARCH(1, 1) case, we retrieve the strict stationarity con-dition already obtained. The matrix At is written in this case as

At = (η2t , 1)′(α1, β1).

We thus have

AtAt−1 . . . A1 =t−1∏k=1

(α1η2t−k + β1)At .

It follows that

log ‖AtAt−1 . . . A1‖ =t−1∑k=1

log(α1η2t−k + β1)+ log ‖At‖

and, in view of (2.23) and by the strong law of large numbers, γ = E log(α1η2t + β1). The necessary

and sufficient condition for the strict stationarity is then E log(α1η2t + β1) < 0, as obtained above.

Example 2.2 (ARCH(2)) For an ARCH(2) model, the matrix At takes the form

At =(

α1η2t α2η

2t

1 0

)and the stationarity region can be evaluated by simulation. Table 2.1 shows, for different valuesof the coefficients α1 and α2, the empirical means and the standard deviations (in parentheses)obtained for 1000 simulations of size 1000 of γ = 1

1000 log ‖A1000A999 . . . A1‖. The ηt ’s have beensimulated from a N(0, 1) distribution. Note that in the ARCH(1) case, simulation provides a goodapproximation of the condition α1 < 3.56, which was obtained analytically. Apart from this case,there exists no explicit strict stationarity condition in terms of the coefficients α1 and α2.

Figure 2.9, constructed from these simulations, gives a more precise idea of the strict stationarityregion for an ARCH(2) process. We shall establish in Corollary 2.3 a result showing that any strictlystationary GARCH process admits small-order moments. We begin with two lemmas which are ofindependent interest.

Page 51: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 35

Table 2.1 Estimations of γ obtained from 1000 simulations of size 1000 in the ARCH(2) case.

α1

α2 0.25 0.3 1 1.2 1.7 1.8 3.4 3.5 3.6

0 – – – – – – −0.049 −0.018 0.010(0.071) (0.071) (0.071)

0.5 – – – −0.175 −0.021 0.006 – – –(0.040) (0.042) (.044)

1 – – −0.011 0.046 – – – – –(0.038) (0.038)

1.75 −0.015 0.001 – – – – – – –(0.035) (0.032)

00

1

2

3

1

1

2

3

2a1

a2

3

Figure 2.9 Stationarity regions for the ARCH(2) model: 1, second-order stationarity; 1 and 2,strict stationarity; 3, non-stationarity.

Lemma 2.2 Let X denote an almost surely positive real random variable. If EXr <∞ for somer > 0 and if E logX < 0, then there exists s > 0 such that EXs < 1.

Proof. The moment-generating function of Y = logX is defined by M(u) = EeuY = EXu. Thefunction M is defined and continuous on [0, r] and we have, for u> 0,

M(u)−M(0)

u=∫g(u, y) dPY (y)

with

g(u, y) = euy − 1

u↑ y when u ↓ 0.

By Beppo Levi’s theorem, the right derivative of M at 0 is∫ydPY (y) = E(logX) < 0.

Since M(0) = 1, there exists s > 0 such that M(s) = EXs < 1. �

Page 52: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

36 GARCH MODELS

The following result, which is stated for any sequence of positive iid matrices, provides anothercharacterization of the strict stationarity of GARCH models.

Lemma 2.3 Let {At } be an iid sequence of positive matrices, with top Lyapunov exponent γ . Then

γ < 0 ⇐⇒ ∃s > 0, ∃k0 ≥ 1, δ := E(‖Ak0Ak0−1 . . . A1‖s) < 1.

Proof. Suppose γ < 0. Since γ = infk 1kE(log ‖AkAk−1 . . . A1‖), there exists k0 ≥ 1 such that

E(log ‖Ak0Ak0−1 . . . A1‖) < 0. Moreover,

E(‖Ak0Ak0−1 . . . A1‖) = ‖E(Ak0Ak0−1 . . . A1)‖= ‖(EA1)

k0‖≤ (E‖A1‖)k0 <∞ (2.33)

using the multiplicative norm ‖A‖ =∑i,j |A(i, j)|, the positivity of the elements of the Ai , andthe independence and equidistribution of the Ai . We conclude, concerning the direct implication,by using Lemma 2.2. The converse does not use the fact that the sequence is iid. If there exists > 0 and k0 ≥ 1 such that δ < 1, we have, by Jensen’s inequality,

γ ≤ 1

k0E(log ‖Ak0Ak0−1 . . . A1‖) ≤ 1

sk0log δ < 0.

Corollary 2.3 Let γ denote the top Lyapunov exponent of the sequence (At ) defined in (2.17).Then

γ < 0 �⇒ ∃s > 0, Eσ 2st <∞, Eε2s

t <∞

where εt = σtηt is the strictly stationary solution of the GARCH(p, q) model (2.5).

Proof. The proof of Lemma 2.3 shows that the real s involved in the two previous lemmas can betaken to be less than 1. For s ∈ (0, 1], a, b> 0 we have

(a

a+b)s + ( b

a+b)s ≥ 1 and, consequently,(∑

i ui)s ≤∑i u

si for any sequence of positive numbers ui . Using this inequality, together with

arguments already used (in particular, the fact that the norm is multiplicative), we deduce that thestationary solution defined in (2.18) satisfies

E‖zt‖s ≤ ‖Eb1‖s

{1+

∞∑k=0

δkk0∑i=1

{E‖A1‖s}i}<∞

where δ is defined in Lemma 2.3. We conclude by noting that σ 2st ≤ ‖z

t‖s and ε2s

t ≤ ‖zt‖s . �

Using Lemma 2.3 and Corollary 2.3 together, it can be seen that for s ∈ (0, 1],

∃k0 ≥ 1, E(‖Ak0Ak0−1 . . . A1‖s) < 1 �⇒ Eε2st <∞. (2.34)

The converse is generally true. For instance, we have for s ∈ (0, 1],

α1 + β1 > 0, Eε2st <∞ �⇒ lim

k→∞E(‖AkAk . . . A1‖s) = 0 (2.35)

(Exercise 2.13).

Page 53: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 37

Second-Order Stationarity

The following theorem gives necessary and sufficient second-order stationarity conditions.

Theorem 2.5 (Second-order stationarity) If there exists a GARCH(p, q) process, in the senseof Definition 2.1, which is second-order stationary and nonanticipative, and if ω> 0, then

q∑i=1

αi +p∑j=1

βi < 1. (2.36)

Conversely, if (2.36) holds, the unique strictly stationary solution of model (2.5) is a weak whitenoise (and thus is second-order stationary). In addition, there exists no other second-order station-ary solution.

Proof. We first show that condition (2.36) is necessary. Let (εt ) be a second-order stationary andnonanticipative GARCH(p, q) process. Then

E(ε2t ) = E

{E(ε2

t | εu, u < t)} = E(σ 2

t )

is a positive real number which does not depend on t . Taking the expectation of both sides of(2.1), we thus have

E(ε2t ) = ω +

q∑i=1

αiE(ε2t )+

p∑j=1

βjE(ε2t ),

that is, 1−q∑i=1

αi −p∑j=1

βj

E(ε2t ) = ω. (2.37)

Since ω is strictly positive, we must have (2.36).Now suppose that (2.36) holds true and let us construct a stationary GARCH solution (in the

sense of Definition 2.2). For t, k ∈ Z, we define Rd -valued vectors as follows:

Zk(t) ={

0 if k < 0bt + AtZk−1(t − 1) if k ≥ 0.

We have

Zk(t)− Zk−1(t) = 0 if k < 0

bt if k = 0At {Zk−1(t − 1)− Zk−2(t − 1)} if k > 0.

By iterating these relations we get, for k > 0,

Zk(t)− Zk−1(t) = AtAt−1 . . . At−k+1bt−k.

On the other hand, for the norm ‖C‖ =∑i,j |cij |, we have, for any random matrix C with positivecoefficients, E‖C‖ = E

∑i,j |cij | = E

∑i,j cij = ‖E(C)‖. Hence, for k > 0,

E‖Zk(t)− Zk−1(t)‖ = ‖E(AtAt−1 . . . At−k+1bt−k)‖,

because the matrix AtAt−1 . . . At−k+1bt−k is positive. All terms of the product AtAt−1 . . .

At−k+1bt−k are independent (because the process (ηt ) is iid and every term of the product is

Page 54: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

38 GARCH MODELS

function of a variable ηt−i , the dates t − i being distinct). Moreover, A := E(At) and b = E(bt )

do not depend on t . Finally, for k > 0,

E‖Zk(t)− Zk−1(t)‖ = ‖Akb‖ = (1, . . . , 1)Akb

because all terms of the vector Akb are positive.Condition (2.36) implies that the modulus of the eigenvalues of A are strictly less than 1.

Indeed, one can verify (Exercise 2.10) that

det(λIp+q − A) = λp+q1−

q∑i=1

αiλ−i −

p∑j=1

βjλ−j . (2.38)

Thus if |λ| ≥ 1, using the inequality |a − b| ≥ |a| − |b|, we get

|det(λIp+q − A)| ≥∣∣∣∣∣∣1−

q∑i=1

αiλ−i −

p∑j=1

βjλ−j

∣∣∣∣∣∣ ≥ 1−q∑i=1

αi −p∑

j=1

βj > 0,

and then ρ(A) < 1. It follows that, in view of the Jordan decomposition, or using (2.21), the con-vergence Ak → 0 holds at exponential rate when k→∞. Hence, for any fixed t , Zk(t) convergesboth in the L1 sense, using Cauchy’s criterion, and almost surely as k→∞. Let z

tdenote the

limit of (Zk(t))k . At fixed k, the process (Zk(t))t∈Z is strictly stationary. The limit process (zt) is

thus strictly stationary. Finally, it is clear that zt

is a solution of equation (2.16).The uniqueness can be shown as in the case p = q = 1, using the representation (2.30). �

Remark 2.6 (On the second-order stationarity of GARCH)

1. Under the conditions of Theorem 2.5, the unique stationary solution of model (2.5) is, using(2.37), a white noise of variance

Var(εt ) = ω

1−∑q

i=1 αi −∑p

j=1 βj.

2. Because the conditions in Theorems 2.4 and 2.5 are necessary and sufficient, we necessarilyhave q∑

i=1

αi +p∑j=1

βi < 1

⇒ γ < 0,

since the second-order stationary solution of Theorem 2.5 is also strictly stationary. Onecan directly check this implication by noting that if (2.36) is true, the previous proof showsthat the spectral radius ρ(EAt ) is strictly less than 1. Moreover, using a result by Kestenand Spitzer (1984, (1.4)), we always have

γ ≤ log ρ(EAt ). (2.39)

IGARCH(p, q) Processes

Whenq∑i=1

αi +p∑j=1

βj = 1

the model is called an integrated GARCH(p, q) or IGARCH(p, q) model (see Engle and Boller-slev, 1986). This name is justified by the existence of a unit root in the autoregressive part ofrepresentation (2.4) and is introduced by analogy with the integrated ARMA models, or ARIMA.However, this analogy can be misleading: there exists no (strict or second-order) stationary solution

Page 55: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 39

of an ARIMA model, whereas an IGARCH model admits a strictly stationary solution under verygeneral conditions. In the univariate case (p = q = 1), the latter property is easily shown.

Corollary 2.4 (Strict stationarity of IGARCH(1, 1)) If P [η2t = 1] < 1 and if α + β = 1,

model (2.7) admits a unique strictly stationary solution.

Proof. Recall that in the case p = q = 1, the matrices At can be replaced by a(ηt ) = αη2t + β.

Hence, γ = E log a(ηt ) ≤ logE{a(ηt )} = 0. The inequality is strict unless if a(ηt ) is a.s. con-stant. Since E{a(ηt )} = 1, this constant can only be equal to 1. Thus η2

t = 1 a.s., which isexcluded. �

This property extends to the general case under slightly more restrictive conditions on the lawof ηt .

Corollary 2.5 Suppose that the distribution of ηt has an unbounded support and has no mass at0. Then, if

∑q

i=1 αi +∑p

j=1 βj = 1, model (2.5) admits a unique strictly stationary solution.

Proof. It is not difficult to show from (2.38) that the spectral radius ρ(A) of the matrix A = EAt

is equal to 1 (Exercise 2.10). It can be shown that the assumptions on the distribution of ηt implythat inequality (2.39) is strict, that is, γ < log ρ(A) (see Kesten and Spitzer, 1984, Theorem 2;Bougerol and Picard, 1992b, Corollary 2.2). This allows us to conclude by Theorem 2.8. �

Note that this strictly stationary solution has an infinite variance in view of Theorem 2.5.

2.3 ARCH(∞) Representation∗

A process (εt ) is called an ARCH(∞) process if there exists a sequence of iid variables (ηt )

such that E(ηt) = 0 and E(η2t ) = 1, and a sequence of constants φi ≥ 0, i = 1, . . . , and φ0 > 0

such that

εt = σtηt , σ 2t = φ0 +

∞∑i=1

φiε2t−i . (2.40)

This class obviously contains the ARCH(q) process and we shall see that it more generally containsthe GARCH(p, q) process.

2.3.1 Existence ConditionsThe existence of a stationary ARCH(∞) process requires assumptions on the sequences (φi) and(ηt ). The following result gives an existence condition.

Theorem 2.6 (Existence of a stationary ARCH(∞) solution) For any s ∈ (0, 1], let

As =∞∑i=1

φsi and µ2s = E|ηt |2s .

Then, if there exists s ∈ (0, 1] such that

Asµ2s < 1, (2.41)

then there exists a strictly stationary and nonanticipative solution to model (2.40), given by

εt = σtηt , σ 2t = φ0 + φ0

∞∑k=1

∑i1,...ik≥1

φi1 . . . φik η2t−i1η

2t−i1−i2 . . . η

2t−i1−···−ik . (2.42)

Page 56: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

40 GARCH MODELS

The process (εt ) defined by (2.42) is the unique strictly stationary and nonanticipative solution ofmodel (2.40) such that E|εt |2s <∞.

Proof. Consider the random variable

St = φ0 + φ0

∞∑k=1

∑i1,...ik≥1

φi1 . . . φik η2t−i1 . . . η

2t−i1−···−ik , (2.43)

taking values in [0,+∞]. Since s ∈ (0, 1], applying the inequality (a + b)s ≤ as + bs for a, b ≥ 0gives

Sst ≤ φs0 + φs0

∞∑k=1

∑i1,...ik≥1

φsi1 . . . φsikη2st−i1 . . . η

2st−i1−···−ik .

Using the independence of the ηt , it follows that

ESst ≤ φs0 + φs0

∞∑k=1

∑i1,...ik≥1

φsi1 . . . φsikE(η2s

t−i1 . . . η2st−i1−···−ik )

= φs0

(1+

∞∑k=1

(Asµ2s )k

)= φs0

1− Asµ2s. (2.44)

This shows that St is almost surely finite. All the summands being positive, we have

∞∑i=1

φiSt−iη2t−i = φ0

∞∑i0=1

φi0η2t−i0

+ φ0

∞∑i0=1

φi0η2t−i0

∞∑k=1

∑i1,...ik≥1

φi1 . . . φik η2t−i0−i1 . . . η

2t−i0−i1−···−ik

= φ0

∞∑k=0

∑i0,...ik≥1

φi0 . . . φik η2t−i0 . . . η

2t−i0−···−ik .

Therefore, the following recursive equation holds:

St = φ0 +∞∑i=1

φiSt−iη2t−i .

A strictly stationary and nonanticipative solution of (2.40) is then obtained by setting εt = S1/2t ηt .

Moreover, E|εt |2s ≤ µ2sφs0/(1− Asµ2s ) in view of (2.44). Now denote by (εt ) any strictly sta-

tionary and nonanticipative solution of model (2.40), such that E|εt |2s <∞. For all q ≥ 1, by q

successive substitutions of the ε2t−i we get

σ 2t = φ0 + φ0

q∑k=1

∑i1,...ik≥1

φi1 . . . φik η2t−i1 . . . η

2t−i1−···−ik

+∑

i1,...iq+1≥1

φi1 . . . φiq+1η2t−i1 . . . η

2t−i1−···−iq ε

2t−i1−···−iq+1

:= St,q + Rt,q .

Page 57: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 41

Note that St,q → St a.s. when q →∞, where St is defined in (2.43). Moreover, because thesolution is nonanticipative, εt is independent of ηt ′ for all t ′> t . Hence,

ERst,q ≤

∑i1,...iq+1≥1

φsi1 . . . φsiq+1

E(|ηt−i1 |2s . . . |ηt−i1−···−iq |2s |εt−i1−···−iq+1 |2s)

= (Asµ2s )qAsE|εt |2s .

Thus∑

q≥1 ERst,q <∞ since Asµ2s < 1. Finally, Rt,q → 0 a.s. when q →∞, which implies

σ 2t = St a.s. �

Equality (2.42) is called a Volterra expansion (see Priestley, 1988). It shows in particular that,under the conditions of the theorem, if φ0 = 0, the unique strictly stationary and nonanticipativesolution of the ARCH(∞) model is the identically null sequence. An application of Theorem 2.6,obtained for s = 1, is that the condition

A1 =∞∑i=1

φi < 1

ensures the existence of a second-order stationary ARCH(∞) process. If

(Eη4t )

1/2∞∑i=1

φi < 1,

it can be shown (see Giraitis, Kokoszka and Leipus, 2000) that Eε4t <∞, Cov(ε2

t , ε2t−h)> 0 for

all h and

+∞∑h=−∞

Cov(ε2t , ε

2t−h) <∞. (2.45)

The fact that the squares have positive autocovariances will be verified later in the GARCH case(see Section 2.5). In contrast to GARCH processes, for which they decrease at exponential rate, theautocovariances of the squares of ARCH(∞) can decrease at the rate h−γ with γ > 1 arbitrarilyclose to 1. A strictly stationary process of the form (2.40) such that

A1 =∞∑i=1

φi = 1

is called integrated ARCH(∞), or IARCH(∞). Notice that an IARCH(∞) process has infi-nite variance. Indeed, if Eε2

t = σ 2 <∞, then, by (2.40), σ 2 = φ0 + σ 2, which is impossible.From Theorem 2.8, the strictly stationary solutions of IGARCH models (see Corollary 2.5) admitIARCH(∞) representations. The next result provides a condition for the existence of IARCH(∞)processes.

Theorem 2.7 (Existence of IARCH(∞) processes) If A1 = 1, if η2t has a nondegenerate dis-

tribution, if E| log η2t | <∞ and if, for some r > 1,

∑∞i=1 φir

i <∞, then there exists a strictlystationary and nonanticipative solution to model (2.40) given by (2.42).

Other integrated processes are the long-memory ARCH, for which the rate of decrease of the φiis not geometric.

Page 58: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

42 GARCH MODELS

2.3.2 ARCH(∞) Representation of a GARCHIt is sometimes useful to consider the ARCH(∞) representation of a GARCH(p, q) process. Forinstance, this representation allows the conditional variance σ 2

t of εt to be written explicitly as afunction of its infinite past. It also allows the positivity conditions (2.20) on the coefficients to beweakened. Let us first consider the GARCH(1, 1) model. If β < 1, we have

σ 2t =

ω

1− β+ α

∞∑i=1

βi−1ε2t−i−1. (2.46)

In this case we have

As = αs∞∑i=1

β(i−1)s = αs

1− βs.

The condition Asµ2s < 1 thus takes the form

αsµ2s + βs < 1, for some s ∈ (0, 1].

For example, if α + β < 1 this condition is satisfied for s = 1. However, second-order stationar-ity is not necessary for the validity of (2.46). Indeed, if (εt ) denotes the strictly stationary andnonanticipative solution of the GARCH(1, 1) model, then, for any q ≥ 1,

σ 2t = ω

q∑i=1

βi−1 + α

q∑i=1

βi−1ε2t−i + βqσ 2

t−q . (2.47)

By Corollary 2.3 there exists s ∈ (0, 1[ such that E(σ 2st ) = c <∞. It follows that∑

q≥1 E(βqσ 2

t−q)s = βc/(1− β) <∞. So βqσ 2t−q converges a.s. to 0 and, by letting q

go to infinity in (2.47), we get (2.46). More generally, we have the following property.

Theorem 2.8 (ARCH(∞) representation of a GARCH(p, q)) If (εt ) is the strictly stationaryand nonanticipative solution of model (2.5), it admits an ARCH(∞) representation of the form(2.40). The constants φi are given by

φ0 = ω

B(1) ,∞∑i=1

φizi = A(z)

B(z) , z ∈ C, |z| ≤ 1, (2.48)

where A(z) = α1z+ · · · + αqzq and B(z) = 1− β1z− · · · − βpz

p.

Proof. Rewrite the model in vector form as

σ 2t = Bσ 2

t−1 + ct ,

where σ 2t = (σ 2

t , . . . , σ2t−p+1)

′, ct = (ω +∑q

i=1 αiε2t−i , 0, . . . , 0)′ and B is the matrix defined in

Corollary 2.2. This corollary shows that, under the strict stationarity condition, we have ρ(B) < 1.Moreover, E‖ct‖s <∞ by Corollary 2.3. Consequently, the components of the vector

∑∞i=0 B

ict−iare almost surely real-valued. We thus have

σ 2t = e′

∞∑i=0

Bict−i , e = (1, 0, . . . , 0)′.

All that remains is to note that the coefficients obtained in this ARCH(∞) representation coincidewith those of (2.48). �

Page 59: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 43

The ARCH(∞) representation can be used to weaken the conditions (2.20) imposed to ensure thepositivity of σ 2

t . Consider the GARCH(p, q) model with ω> 0, without any a priori positivityconstraint on the coefficients αi and βj , and assuming that the roots of the polynomial B(z) =1− β1z− · · · − βpz

p have moduli strictly greater than 1. The coefficients φi introduced in (2.48)are then well defined, and under the assumption

φi ≥ 0, i = 1, . . . , (2.49)

we have

σ 2t = φ0 +

∞∑i=1

φiε2t−i ∈ (0,+∞].

Indeed, φ0 > 0 because otherwise B(1) ≤ 0, which would imply the existence of a root inside theunit circle since B(0) = 1. Moreover, the proofs of the sufficient parts of Theorem 2.4, Lemma 2.3and Corollary 2.3 do not use the positivity of the coefficients of the matrices At . It follows thatif the top Lyapunov exponent of the sequence (At ) is such that γ < 0, the variable σ 2

t is a.s.finite-valued. To summarize, the conditions

ω> 0, γ < 0, φi ≥ 0, i ≥ 1 and (B(z) = 0 ⇒ |z|> 1)

imply that there exists a strictly stationary and nonanticipative solution to model (2.5). The condi-tions (2.49) are not generally simple to use, however, because they imply an infinity of constraintson the coefficients αi and βj . In the ARCH(q) case, they reduce to the conditions (2.20), thatis, αi ≥ 0, i = 1, . . . , q. Similarly, in the GARCH(1, 1) case, it is necessary to have α1 ≥ 0 andβ1 ≥ 0. However, for p ≥ 1 and q > 1, the conditions (2.20) can be weakened (Exercise 2.14).

2.3.3 Long-Memory ARCHThe introduction of long memory into the volatility can be motivated by the observation that theempirical autocorrelations of the squares, or of the absolute values, of financial series decay veryslowly in general (see for example Table 2.1). We shall see that it is possible to reproduce thisproperty by introducing ARCH(∞) processes, with a sufficiently slow decay of the modulus ofthe coefficients φi .8 A process (Xt ) is said to have long memory if it is second-order stationaryand satisfies, for h→∞,

|Cov(Xt ,Xt−h)| ∼ Kh2d−1, where d <1

2(2.50)

and K is a nonzero constant. An alternative definition relies on distinguishing ‘intermediate-memory’ processes for which d < 0 and thus

∑+∞h=−∞ |Cov(Xt ,Xt−h)| <∞, and ‘long-memory’

processes for which d ∈ (0, 1/2[ and thus∑+∞

h=−∞ |Cov(Xt ,Xt−h)| = ∞ (see Brockwell andDavis, 1991, p. 520). The autocorrelations of an ARMA process decrease at exponential ratewhen the lag increases. The need for processes with a slower autocovariance decay leads to theintroduction of the fractionary ARIMA models. These models are defined through the fractionarydifference operator

(1− B)d = 1+∞∑j=1

(−d)(1− d) . . . (j − 1− d)

j !Bj , d >−1.

8 The slow decrease of the empirical autocorrelations can also be explained by nonstationarity phenomena,such as structural changes or breaks (see, for instance, Mikosch and Starica, 2004).

Page 60: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

44 GARCH MODELS

Denoting by πj the coefficient of Bj in this sum, it can be shown that πj ∼ Kj−d−1 when j →∞,where K is a constant depending on d. An ARIMA(p, d, q) process with d ∈ (−0.5, 0.5[ is definedas a stationary solution of

ψ(B)(1− B)dXt = θ(B)εt

where εt is a white noise, and ψ and θ are polynomials of degree p and q, respectively. If the rootsof these polynomials are all outside the unit disk, the unique stationary and purely deterministicsolution is causal, invertible and its covariances satisfy (2.50) (see Brockwell and Davis, 1991,Theorem 13.2.2.). By analogy with the ARIMA models, the class of FIGARCH(p, d, q) processesis defined by the equations

εt = σtηt , σ 2t = φ0 +

{1− (1− B)d

θ(B)

ψ(B)

}ε2t−i , d ∈ (0, 1[, φ0 > 0, (2.51)

where ψ and θ are polynomials of degree p and q respectively, such that ψ(0) = θ(0) = 1, theroots of ψ have moduli strictly greater than 1 and φi ≥ 0, where the φi are defined by

∞∑i=1

φizi = 1− (1− z)d

θ(z)

ψ(z).

We have φi ∼ Ki−d−1, where K is a positive constant, when i →∞ and∑∞

i=1 φi = 1. The processintroduced in (2.51) is thus an IARCH(∞), provided it exists. Note that existence of this processcannot be obtained by Theorem 2.7 because, for the FIGARCH model, the φi decrease more slowlythan the geometric rate. The following result, which is a consequence of Theorem 2.6, and whoseproof is the subject of Exercise 2.20, provides another sufficient condition for the existence ofIARCH(∞) processes.

Corollary 2.6 (Existence of some FIGARCH processes) IfA1 = 1, then condition (2.41) is sat-isfied if and only if there exists p∗ ∈ (0, 1] such that Ap∗ <∞ and

∞∑i=1

φi logφi + E(η20 log η2

0) ∈ (0,+∞]. (2.52)

The strictly stationary and nonanticipative solution of model (2.40) is thus given by (2.42) and issuch that E|εt |q <∞, for any q ∈ [0, 2[, and Eε2

t = ∞.

This result can be used to prove the existence of FIGARCH(p, d, q) processes for d ∈ (0, 1[ suffi-ciently close to 1, if the distribution of η2

0 is assumed to be nondegenerate (hence, E(η20 log η2

0)> 0);see Douc, Roueff and Soulier (2008). The FIGARCH process of Corollary 2.6 does not admit afinite second-order moment. Its square is thus not a long-memory process in the sense of definition(2.50). More generally, it can be shown that the squares of the ARCH(∞) processes do not havethe long-memory property. This motivated the introduction of an alternative class, called linearARCH (LARCH) and defined by

εt = σtηt , σt = b0 +∞∑i=0

biεt−i , ηt iid (0, 1). (2.53)

Under appropriate conditions, this model is compatible with the long-memory property for ε2t .

Page 61: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 45

2.4 Properties of the Marginal Distribution

We have seen that, under quite simple conditions, a GARCH(p, q) model admits a strictly stationarysolution (εt ). However, the marginal distribution of the process (εt ) is never known explicitly. Theaim of this section is to highlight some properties of this distribution through the marginal moments.

2.4.1 Even-Order MomentsWe are interested in finding existence conditions for the moments of order 2m, where m is anypositive integer.9 Let ⊗ denote the tensor product, or Kronecker product, and recall that it isdefined as follows: for any matrices A = (aij ) and B, we have A⊗ B = (aijB). For any matrixA, let A⊗m = A⊗ · · · ⊗ A. We have the following result.

Theorem 2.9 (2mth-order stationarity) Let A(m) = E(A⊗mt ) where At is defined by (2.17). Sup-pose that E(η2m

t ) <∞ and that the spectral radius

ρ(A(m)) < 1.

Then, for any t ∈ Z, the series (zt) defined in (2.18) converges in Lm and the process (ε2

t ),defined as the first component of z

t, is strictly stationary and admits moments up to order m.

Conversely, if ρ(A(m)) ≥ 1, there exists no strictly stationary solution (εt ) to (2.5) such thatE(ε2m

t ) <∞.

Example 2.3 (Moments of a GARCH(1, 1) process) When p = q = 1, the matrix At iswritten as

At = (η2t , 1)′(α1, β1).

Hence, all the eigenvalues of the matrix A(m) = E{(η2t , 1)

′⊗m}(α1, β1)⊗m are null except one. The

nonzero eigenvalue is thus the trace of A(m). It readily follows that the necessary and sufficientcondition for the existence of E(ε2m

t ) is

m∑i=0

(m

i

)αi1β

m−i1 µ2i < 1 (2.54)

where µ2i = E(η2it ), i = 0, . . . , m. The moments can be computed recursively, by expanding

E(z⊗mt

) = E(bt + Atzt−1)⊗m. For the fourth-order moment, a direct computation gives

E(ε4t ) = E(σ 4

t )E(η4t )

= µ4{ω2 + 2ω(α1 + β1)E(ε

2t−1)+ (β2

1 + 2α1β1)E(σ4t−1)+ α2

1E(ε4t−1)

}and thus

E(ε4t ) =

ω2(1+ α1 + β1)

(1− µ4α21 − β2

1 − 2α1β1)(1− α1 − β1)µ4,

provided that the denominator is positive. Figure 2.10 shows the zones of second-order and fourth-order stationarity for the strong GARCH(1, 1) model when ηt ∼ N(0, 1).

9 Only even-order moments are considered, because if a symmetry assumption is made on the distributionof ηt , the odd-order moments are null when they exist. If this symmetry assumption is not made, computingthese moments seems extremely difficult.

Page 62: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

46 GARCH MODELS

0.0 0.2 0.4 0.6a1

b1

0.8 1.0 1.20.0

0.2

0.4

0.6

0.8

1.0

1

3

2

Figure 2.10 Regions of moments existence for the GARCH(1, 1) model: 1, moment of order 4;1 and 2, moment of order 2; 3, infinite variance.

This example shows that for a nontrivial GARCH process, that is, when the αi and βj are not allequal to zero, the moments cannot exist for any order.

Proof of Theorem 2.9. For k > 0, let

At,k = AtAt−1 · · ·At−k+1, and zt,k= At,kbt−k,

with the convention that At,0 = Ip+q and zt,0 = bt . Notice that the components of z

t:=∑∞

k=0 zt,kare almost surely defined in [0,+∞) ∪ {∞}, without any restriction on the model coefficients. Let‖ · ‖ denote the matrix norm such that ‖A‖ =∑i,j |aij |. Using the elementary equalities

‖A‖‖B‖ = ‖A⊗ B‖ = ‖B ⊗ A‖

and the associativity of the Kronecker product, we obtain, for k > 0,

E‖zt,k‖m = E‖At,kbt−k ⊗ · · · ⊗ At,kbt−k‖ = ‖E(At,kbt−k ⊗ · · · ⊗ At,kbt−k)‖,

since the elements of the matrix At,kbt−k are positive. For any vector X conformable to the matrixA, we have

(AX)⊗m = A⊗mX⊗m

by the property of the Kronecker product, AB ⊗ CD = (A⊗ C)(B ⊗D), for any matrices suchthat the products AB and CD are well defined. It follows that

E‖zt,k‖m = ‖E(At,k

⊗mbt−k⊗m)‖ = ‖E(A⊗mt . . . A⊗mt−k+1b

⊗mt−k)‖. (2.55)

Let b(m) = E(b⊗mt ) and recall that A(m) = E(A⊗mt ). In view of (2.55), we get

E‖zt,k‖m = ‖(A(m))kb(m)‖,

Page 63: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 47

using the independence between the matrices in the product At . . . At−k+1bt−k (since At−i is afunction of ηt−i). The matrix norm being multiplicative, it follows, using (2.18), that

‖zt‖m = {E‖zt‖m}1/m

≤∞∑k=0

‖zt,k‖m

≤{ ∞∑k=0

‖(A(m))k‖1/m

}‖b(m)‖1/m. (2.56)

If the spectral radius of the matrix A(m) is strictly less than 1, then ‖(A(m))k‖ converges to zeroat exponential rate when k tends to infinity. In this case, z

tis almost surely finite. It is the strictly

stationary solution of equation (2.16), and this solution belongs to Lm. It is clear, on the otherhand, that

‖ε2t ‖m ≤ ‖zt‖m

because the norm of zt

is greater than that of any of its components. A sufficient condition for theexistence of E(ε2m

t ) is then ρ(A(m)) < 1.Conversely, suppose that (ε2

t ) belongs to Lm. For any vectors x and y of the same dimension,let x ≤ y mean that the components of y − x are all positive. Then, for any n ≥ 0,

E(z⊗mt

) = E(zt,0 + · · · + z

t,n+ At . . . At−nzt−n−1)

⊗m

≥ E

(n∑

k=0

zt,k

)⊗m

≥n∑

k=0

E(z⊗mt,k

)

=n∑

k=0

(A(m))kb(m)

because all the terms involved in theses expressions are positive. Since the components of E(z⊗mt

)

are finite, we have

limn→∞(A

(m))nb(m) = 0. (2.57)

To conclude, it suffices to show that

limn→∞(A

(m))n = 0 (2.58)

because this is equivalent to ρ(A(m)) < 1. To deduce (2.58) from (2.57), we need to show that forany fixed integer k,

the components of (A(m))kb(m) are all strictly positive. (2.59)

The previous computations showed that

(A(m))kb(m) = E(z⊗mt,k

).

Given the form of the matrices At , the qth component of zt,q

is the first component of bt−q , whichis not almost surely equal to zero. First suppose that αq and βp are both not equal to zero. In thiscase the first component of z

t,kcannot be equal to zero almost surely, for any k ≥ q + 1. Also,

Page 64: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

48 GARCH MODELS

still in view of the form of the matrices At , the ith component of zt,k

is the (i − 1)th componentof z

t−1,k−1 for i = 2, . . . , q. Hence, none of the first q components of zt,2q can be equal to zero

almost surely, and the same property holds for zt,k

whatever k ≥ 2q. The same argument showsthat none of the last q components of z

t,kis equal to zero almost surely when k ≥ 2p. Taking

into account the positivity of the variables zt,k

, this shows that in the case αqβp = 0, (2.59) holdstrue for k ≥ max{2p, 2q}. If αqβp = 0, one can replace z

tby a vector of smaller size, obtained

by canceling the component ε2t−q+1 if αq = 0 and the component σ 2

t−p+1 if βp = 0. The matrixAt is then replaced by a matrix of smaller size, but with the same nonzero eigenvalues as At .Similarly, the matrix A(m) will be replaced by a matrix of smaller size but with the same nonzeroeigenvalues. If αq−1βp−1 = 0 we are led to the preceding case, otherwise we pursue the dimensionreduction. �

2.4.2 KurtosisAn easy way to measure the size of distribution tails is to use the kurtosis coefficient. Thiscoefficient is defined, for a centered (zero-mean) distribution, as the ratio of the fourth-ordermoment, which is assumed to exist, to the squared second-order moment. This coefficient is equalto 3 for a normal distribution, this value serving as a gauge for the other distributions. In the caseof GARCH processes, it is interesting to note the difference between the tails of the marginal andconditional distributions. For a strictly stationary solution (εt ) of the GARCH(p, q) model definedby (2.5), the conditional moments of order k are proportional to σ 2k

t :

E(ε2kt | εu, u < t) = σ 2k

t E(η2kt ).

The kurtosis coefficient of this conditional distribution is thus constant and equal to the kurtosiscoefficient of ηt . For a general process of the form

εt = σtηt ,

where σt is a measurable function of the past of εt , ηt is independent of this past and (ηt ) is iidcentered, the kurtosis coefficient of the stationary marginal distribution is equal, provided that itexists, to

κε := E(ε4t )

{E(ε2t )}2

= E{E(ε4

t | εu, u < t)}[

E{E(ε2

t | εu, u < t)}]2 = E(σ 4

t )

{E(σ 2t )}2

κη

where κη = Eη4t denotes the kurtosis coefficient of (ηt ). It can thus be seen that the tails of

the marginal distribution of (εt ) are fatter when the variance of σ 2t is large relative to the squared

expectation. The minimum (corresponding to the absence of ARCH effects) is given by the kurtosiscoefficient of (ηt ),

κε ≥ κη,

with equality if and only if σ 2t is almost surely constant. In the GARCH(1, 1) case we thus have,

from the previous calculations,

κε = 1− (α1 + β1)2

1− (α1 + β1)2 − α21(κη − 1)

κη. (2.60)

The excess kurtosis coefficients of εt and ηt , relative to the normal distribution, are related by

κ∗ε = κε − 3 = 6α21 + κ∗η {1− (α1 + β1)

2 + 3α21}

1− (α1 + β1)2 − 2α21 − κ∗ηα

21

, κ∗η = κη − 3.

The excess kurtosis of εt increases with that of ηt and when the GARCH coefficientsapproach the zone of nonexistence of the fourth-order moment. Notice the asymmetry between the

Page 65: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 49

GARCH coefficients in the excess kurtosis formula. For the general GARCH(p, q), we have thefollowing result.

Proposition 2.1 (Excess kurtosis of a GARCH(p, q) process) Let (εt ) denote a GARCH(p, q)process admitting moments up to order 4 and let κ∗ε be its excess kurtosis coefficient. Let a =∑∞

i=1 ψ2i where the coefficients ψi are defined by

1+∞∑i=1

ψizi =

1−max(p,q)∑

i=1

(αi + βi)zi

−1 (

1−p∑i=1

βizi

).

Then the excess kurtosis of the distribution of εt relative to the Gaussian is

κ∗ε =6a + κ∗η (1+ 3a)

1− a(κ∗η + 2).

Proof. The ARMA(max(p, q), p) representation (2.4) implies

ε2t = Eε2

t + νt +∞∑i=1

ψiνt−i ,

where∑∞

i=1 |ψi | <∞ follows from the condition∑max(p,q)

i=1 (αi + βi) < 1, which is a consequenceof the existence of Eε4

t . The process (νt ) = (σ 2t (η

2t − 1)) is a weak white noise of variance

Var(νt ) = Eσ 4t E(η

2t − 1)2 = (κη − 1)Eσ 4

t . It follows that

Var(ε2t ) = Var(νt )+

∞∑i=1

ψ2i Var(νt−i ) = (a + 1)(κη − 1)Eσ 4

t

= Eε4t − (Eε2

t )2 = κηEσ

4t − (Eε2

t )2,

hence

Eσ 4t =

(Eε2t )

2

κη − (a + 1)(κη − 1)

andκε = κη

κη − (a + 1)(κη − 1)= κη

1− a(κη − 1),

and the proposition follows. �

It will be seen in Chapter 7 that the Gaussian quasi-maximum likelihood estimator of the coefficientsof a GARCH model is consistent and asymptotically normal even if the distribution of the variablesηt is not Gaussian. Since the autocorrelation function of the squares of the GARCH process doesnot depend on the law of ηt , the autocorrelations obtained by replacing the unknown coefficients bytheir estimates are generally very close to the empirical autocorrelations. In contrast, the kurtosiscoefficients obtained from the theoretical formula, by replacing the coefficients by their estimatesand the kurtosis of ηt by 3, can be very far from the coefficients obtained empirically. This is notvery surprising since the preceding result shows that the difference between the kurtosis coefficientsof εt computed with a Gaussian and a non-Gaussian distribution for ηt ,

κ∗ε (κ∗η )− κ∗ε (0) =

κ∗η (1− a)

(1− 2a){1− a(κ∗η + 2)} ,

is not bounded as a approaches (κ∗η + 2)−1.

Page 66: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

50 GARCH MODELS

2.5 Autocovariances of the Squares of a GARCH

We have seen that if (εt ) is a GARCH process which is fourth-order stationary, then (ε2t ) is an

ARMA process. It must be noted that this ARMA is very constrained, as can be seen from (2.4):the order of the AR part is larger than that of the MA part, and the AR coefficients are greaterthan those of the MA part, which are positive. We shall start by examining some consequencesof these constraints on the autocovariances of (ε2

t ). Then we shall show how to compute theseautocovariances explicitly.

2.5.1 Positivity of the AutocovariancesFor a GARCH(1, 1) model such that Eε4

t <∞, the autocorrelations of the squares take the form

ρε2 (h) := Corr(ε2t , ε

2t−h) = ρε2(1)(α1 + β1)

h−1, h ≥ 1, (2.61)

where

ρε2 (1) = α1{1− β1(α1 + β1)}1− (α1 + β1)2 + α2

1

(2.62)

(Exercise 2.8). It follows immediately that these autocorrelations are nonnegative. The next propertygeneralizes this result.

Proposition 2.2 (Positivity of the autocovariances) If the GARCH(p, q) process (εt ) admitsmoments of order 4, then

γε2 (h) = Cov(ε2t , ε

2t−h) ≥ 0, ∀h.

If, moreover, α1 > 0, thenγε2(h)> 0, ∀h.

Proof. It suffices to show that in the MA(∞) expansion of ε2t , all the coefficients are nonnegative

(and strictly positive when α1 > 0). In the notation introduced in(2.2), this expansion takes the form

ε2t = {1− (α + β)(1)}−1ω + {1− (α + β)(B)}−1(1− β(B))ut := ω∗ + ψ(B)ut .

Noting that 1− β(B) = 1− (α + β)(B)+ α(B), it suffices to show the nonnegativity of the coef-ficients ci in the series expansion

α(B)

1− (α + β)(B)=

∞∑i=1

ciBi.

We show by induction thatci ≥ α1(α1 + β1)

i−1, i ≥ 1.

Obviously c1 = α1. Moreover, with the convention αi = 0 if i > q and βj = 0 if j >p,

ci+1 = c1(αi + βi)+ . . .+ ci(α1 + β1)+ αi+1.

If the inductive assumption holds true at order i, using the positivity of the GARCH coefficientswe deduce that ci+1 ≥ α1(α1 + β1)

i . Hence the desired result. �

Remark 2.7 The property that the autocovariances are nonnegative is satisfied, more generally,for an ARCH(∞) process of the form (2.40), provided it is fourth-order stationary. It can also be

Page 67: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 51

shown that the square of an ARCH(∞) process is associated . On these properties see Giraitis,Leipus and Surgailis (2009) and references therein.

Note that the property of positive autocorrelations for the squares, or for the absolute values, istypically observed on real financial series (see, for instance, the second row of Table 2.1).

2.5.2 The Autocovariances Do Not Always DecreaseFormulas (2.61) and (2.62) show that for a GARCH(1, 1) process, the autocorrelations of thesquares decrease. An illustration is provided in Figure 2.11. A natural question is whether thisproperty remains true for more general GARCH(p, q) processes. The following computation showsthat this is not the case. Consider an ARCH(2) process admitting moments of order 4 (the existencecondition and the computation of the fourth-order moment are the subject of Exercise 2.7):

εt =√ω + α1ε

2t−1 + α2ε

2t−2 ηt .

We know that ε2t is an AR(2) process, whose autocorrelation function satisfies

ρε2(h) = α1ρε2(h− 1)+ α2ρε2(h− 2), h> 0.

It readily follows that

ρε2(2)

ρε2(1)= α2

1 + α2(1− α2)

α1

and hence that

ρε2(2) < ρε2(1) ⇐⇒ α2(1− α2) < α1(1− α1).

The latter equality is of course true for the ARCH(1) process (α2 = 0) but is not true for any(α1, α2). Figure 2.12 gives an illustration of this nondecreasing feature of the first autocorrelations(and partial autocorrelations). The sequence of autocorrelations is, however, decreasing after acertain lag (Exercise 2.16).

2 4 6 8 10 12

0.1

0.2

0.3

0.4

2 4 6 8 10 12h h

0.1

0.2

0.3

0.4

Figure 2.11 Autocorrelation function (left) and partial autocorrelation function (right) of thesquares of the GARCH(1, 1) model εt = σtηt , σ 2

t = 1+ 0.3ε2t−1 + 0.55σ 2

t−1, (ηt ) iid N(0, 1).

Page 68: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

52 GARCH MODELS

2 4 6 8 10 12

0.05

h h

0.1

0.15

0.2

0.25

2 4 6 8 10 12

0.05

0.1

0.15

0.2

0.25

Figure 2.12 Autocorrelation function (left) and partial autocorrelation function (right) of thesquares of the ARCH(2) process εt = σtηt , σ 2

t = 1+ 0.1ε2t−1 + 0.25ε2

t−2, (ηt ) iid N(0, 1).

2.5.3 Explicit Computation of the Autocovariances of the SquaresThe autocorrelation function of (ε2

t ) will play an important role in identifying the orders of themodel. This function is easily obtained from the ARMA(max(p, q), p) representation

ε2t −

max(p,q)∑i=1

(αi + βi)ε2t−i = ω + νt −

p∑i=1

βiνt−i .

The autocovariance function is more difficult to obtain (Exercise 2.8) because one has to compute

Eν2t = E(η2

t − 1)2Eσ 4t .

One can use the method of Section 2.4.1. Consider the vector representation

zt= bt +Atzt−1

defined by (2.16) and (2.17). Using the independence between zt

and (bt , At ), together withelementary properties of the Kronecker product ⊗, we get

Ez⊗2t= E(bt + Atzt−1)⊗ (bt + Atzt−1)

= Ebt ⊗ bt + EAt zt−1 ⊗ bt + Ebt ⊗ Atzt−1 + EAt zt−1 ⊗ Atzt−1

= Eb⊗2t + EAt ⊗ btEzt−1 + Ebt ⊗ AtEzt−1 + EA⊗2

t Ez⊗2t−1.

Thus

Ez⊗2t= (I(p+q)2 −A(2))−1 {

b(2) + (EAt ⊗ bt + Ebt ⊗ At

)z(1)}, (2.63)

whereA(m) = E(A⊗mt ), z(m) = Ez⊗m

t, b(m) = E(b⊗mt ).

To compute A(m), we can use the decomposition At = η2t B + C, where B and C are deterministic

matrices. We then have, letting µm = Eηmt ,

A(2) = E(η2t B + C)⊗ (η2

t B + C) = µ4B⊗2 + B ⊗ C + C ⊗ B + C⊗2.

Page 69: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 53

We obtain EAt ⊗ bt and Ebt ⊗ At similarly. All the components of z(1) are equal to ω/(1−∑αi −

∑βi). Note that for h> 0, we have

Ezt⊗ z

t−h = E(bt + Atzt−1

)⊗ zt−h

= b(1) ⊗ z(1) + (A(1) ⊗ Ip+q)Ez

t⊗ z

t−h+1. (2.64)

Let e1 = (1, 0, . . . , 0) ∈ R(p+q)2 . The following algorithm can be used:

• Define the vectors z(1), b(1), b(2), and the matrices EAt ⊗ bt , Ebt ⊗ At , A(1), A(2) as afunction of αi , βi , and ω, µ4.

• Compute Ez⊗2t

from (2.63).

• For h = 1, 2, . . . , compute Ezt⊗ z

t−h from (2.64).

• For h = 0, 1, . . . , compute γε2 (h) = e1Ezt ⊗ zt−h −

(e1z

(1))2

.

This algorithm is not very efficient in terms of computation time and memory space, but it is easyto implement.

2.6 Theoretical Predictions

The definition of GARCH processes in terms of conditional expectations allows us to compute theoptimal predictions of the process and its square given its infinite past. Let (εt ) be a stationaryGARCH(p, q) process, in the sense of Definition 2.1. The optimal prediction (in the L2 sense) ofεt given its infinite past is 0 by Definition 2.1(i). More generally, for h ≥ 0,

E(εt+h | εu, u < t) = E{E(εt+h | εt+h−1) | εu, u < t} = 0, t ∈ Z,

which shows that the optimal prediction of any future variable given the infinite past is zero. Themain attraction of GARCH models obviously lies not in the prediction of the GARCH processitself but in the prediction of its square. The optimal prediction of ε2

t given the infinite past of εtis σ 2

t . More generally, the predictions at horizon h ≥ 0 are obtained recursively by

E(ε2t+h | εu, u < t) = E(σ 2

t+h | εu, u < t)

= ω +q∑i=1

αiE(ε2t+h−i | εu, u < t)+

p∑j=1

βjE(σ2t+h−j | εu, u < t),

with, for i ≤ h,E(ε2

t+h−i | εu, u < t) = E(σ 2t+h−i | εu, u < t),

for i > h,E(ε2

t+h−i | εu, u < t) = ε2t+h−i ,

and for i ≥ h,E(σ 2

t+h−i | εu, u < t) = σ 2t+h−i .

These predictions coincide with the optimal linear predictions of the future values of ε2t given

its infinite past. We shall consider in Chapter 4 a more general class of GARCH models (weakGARCH) for which the two types of predictions, optimal and linear optimal, do not necessarilycoincide.

It is important to note that E(ε2t+h | εu, u < t) = Var(εt+h | εu, u < t) is the conditional vari-

ance of the prediction error of εt+h. Hence, the accuracy of the predictions depends on the past:

Page 70: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

54 GARCH MODELS

it is particularly low after a turbulent period, that is, when the past values are large in absolutevalue (assuming that the coefficients αi and βj are nonnegative). This property constitutes a crucialdifference with standard ARMA models, for which the magnitude of the prediction intervals isconstant, for a given horizon.

Figures 2.13–2.16, based on simulations, allow us to visualize this difference. In Figure 2.13,obtained from a Gaussian white noise, the predictions at horizon 1 have a constant variance:the confidence interval [−1.96, 1.96] contains roughly 95% of the realizations. Using a constantinterval for the next three series, displayed in Figures 2.14–2.16, would imply very bad results. Incontrast, the intervals constructed here (for conditionally Gaussian distributions, with zero meanand variance σ 2

t ) do contain about 95% of the observations: in the quiet periods a small intervalis enough, whereas in turbulent periods the variability increases and larger intervals are needed.

For a strong GARCH process it is possible to go further, by computing optimal predictions ofthe powers of ε2

t , provided that the corresponding moments exist for the process (ηt ). For instance,computing the predictions of ε4

t allows us to evaluate the variance of the prediction errors of ε2t .

However, these computations are tedious, the linearity property being lost for such powers.When the GARCH process is not directly observed but is the innovation of an ARMA process,

the accuracy of the prediction at some date t directly depends of the magnitude of the conditionalheteroscedasticity at this date. Consider, for instance, a stationary AR(1) process, whose innovationis a GARCH(1, 1) process:

Xt = φXt−1 + εtεt = σtηtσ 2t = ω + αε2

t−1 + βσ 2t−1,

(2.65)

where ω> 0, α ≥ 0, β ≥ 0, α + β ≤ 1 and |φ| < 1. We have, for h ≥ 0,

Xt+h = εt+h + φεt+h−1 + · · · + φhεt + φh+1Xt−1.

Hence,E(Xt+h|Xu, u < t) = φh+1Xt−1

100 200 300 400 500

−3

−1

1

3

Figure 2.13 Prediction intervals at horizon 1, at 95%, for the strong N(0, 1) white noise.

Page 71: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 55

100 200 300 400 500

−12

−7

−2

3

8

13

Figure 2.14 Prediction intervals at horizon 1, at 95%, for the GARCH(1, 1) process simulatedwith ω = 1, α = 0.1, β = 0.8 and N(0, 1) distribution for (ηt ).

100 200 300 400 500

−30

−20

−10

0

10

20

30

Figure 2.15 Prediction intervals at horizon 1, at 95%, for the GARCH(1, 1) process simulatedwith ω = 1, α = 0.6, β = 0.2 and N(0, 1) distribution for (ηt ).

Page 72: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

56 GARCH MODELS

100 200 300 400 500−100

−50

0

50

Figure 2.16 Prediction intervals at horizon 1, at 95%, for the GARCH(1, 1) process simulatedwith ω = 1, α = 0.7, β = 0.3 and N(0, 1) distribution for (ηt ).

since the past of Xt coincides with that of its innovation εt . Moreover,

Var(Xt+h|Xu, u < t) = Var

(h∑i=0

φh−iεt+i | εu, u < t

)

=h∑i=0

φ2(h−i)Var(εt+i | εu, u < t).

Since Var(εt | εu, u < t) = σ 2t and, for i ≥ 1,

Var(εt+i | εu, u < t) = E(σ 2t+i | εu, u < t)

= ω + (α + β)E(σ 2t+i−1 | εu, u < t)

= ω{1+ · · · + (α + β)i−1} + (α + β)iσ 2t ,

we have

Var(εt+i | εu, u < t) = ω1− (α + β)i

1− (α + β)+ (α + β)iσ 2

t , for all i ≥ 0.

Consequently,

Var(Xt+h | Xu, u < t)

=(

h∑i=0

φ2(h−i))

ω

1− (α + β)+

h∑i=0

(α + β)iφ2(h−i){σ 2t −

ω

1− (α + β)

}

= ω(1− φ2(h+1))

{1− (α + β)}(1− φ2)+{σ 2t −

ω

1− (α + β)

}φ2(h+1) − (α + β)(h+1)

φ2 − (α + β)

Page 73: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 57

if φ2 = α + β and

Var(Xt+h | Xu, u < t) = ω(1− φ2(h+1))

(1− φ2)2+{σ 2t −

ω

1− (α + β)

}(h+ 1)φ2h

if φ2 = α + β. The coefficient of σ 2t − ω

1−(α+β) always being positive, it can be seen that thevariance of the prediction at horizon h increases linearly with the difference between the conditionalvariance at time t and the unconditional variance of εt . A large negative difference (correspondingto a low-volatility period) thus results in highly accurate predictions. Conversely, the accuracydeteriorates when σ 2

t is large. When the horizon h increases, the importance of this factor decreases.If h tends to infinity, we retrieve the unconditional variance of Xt :

limh→∞

Var(Xt+h | Xu, u < t) = Var(Xt ) = Var(εt )

1− φ2.

Now we consider two nonstationary situations. If |φ| = 1, and initializing, for instance at 0,all the variables at negative dates (because here the infinite pasts of Xt and εt do not coincide),the previous formula becomes

Var(Xt+h | Xu, u < t) = ωh

{1− (α + β)} +{σ 2t −

ω

1− (α + β)

}1− (α + β)(h+1)

1− (α + β).

Thus, the impact of the observations before time t does not vanish as h increases. It becomesnegligible, however, compared to the deterministic part which is proportional to h. If |φ| < 1 andα + β = 1 (IGARCH(1, 1) errors), we have

Var(εt+i | εu, u < t) = ωi + σ 2t , for all i ≥ 0,

and it can be seen that the impact of the past variables on the variance of the predictions remainsconstant as the horizon increases. This phenomenon is called persistence of shocks on the volatility.Note, however, that, as in the preceding case, the nonrandom part of the decomposition of Var(εt+i |εu, u < t) becomes dominant when the horizon tends to infinity. The asymptotic precision of thepredictions of εt is null, and this is also the case for Xt since

Var(Xt+h | Xu, u < t) ≥ Var(εt+h | εu, u < t).

2.7 Bibliographical Notes

The strict stationarity of the GARCH(1, 1) model was first studied by Nelson (1990a) under theassumption E log+ η2

t <∞. His results were extended by Kluppelberg, Lindner and Maller (2004)to the case of E log+ η2

t = +∞. For GARCH(p, q) models, the strict stationarity conditions wereestablished by Bougerol and Picard (1992b). For model (2.16), where (At , bt ) is a strictly stationaryand ergodic sequence with a logarithmic moment, Brandt (1986) showed that γ < 0 ensures theexistence of a unique strictly stationary solution. In the case where (At , bt ) is iid, Bougerol andPicard (1992a) established the converse property showing that, under an irreducibility condition,a necessary condition for the existence of a strictly stationary and nonanticipative solution is thatγ < 0. Liu (2007) used Representation (2.30) to obtain stationarity conditions for more generalGARCH models. The second-order stationarity condition for the GARCH(p, q) model was obtainedby Bollerslev (1986), as well as the existence of an ARMA representation for the square of aGARCH (see also Bollerslev, 1988). Nelson and Cao (1992) obtained necessary and sufficientpositivity conditions for the GARCH(p, q) model. These results were recently extended by Tsaiand Chan (2008).

GARCH are not the only models that generate uncorrelated times series with correlated squares.Another class of interest which shares these characteristics is that of the all-pass time series models

Page 74: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

58 GARCH MODELS

(ARMA models in which the roots of the AR polynomials are reciprocals of roots of the MApolynomial and vice versa) studied by Breidt, Davis and Trindade (2001).

ARCH(∞) models were introduced by Robinson (1991); see Giraitis, Leipus and Surgailis(2009) for the study of these models. The condition for the existence of a strictly stationaryARCH(∞) process was established by Robinson and Zaffaroni (2006) and Douc, Roueff andSoulier (2008). The condition for the existence of a second-order stationary solution, as well asthe positivity of the autocovariances of the squares, were obtained by Giraitis, Kokoszka andLeipus (2000). Theorems 2.8 and 2.7 were proven by Kazakevicius and Leipus (2002, 2003).The uniqueness of an ARCH(∞) solution is discussed in Kazakevicius and Leipus (2007). Theasymptotic properties of quasi-maximum likelihood estimators were established by Robinson andZaffaroni (2006). See Doukhan, Teyssiere and Vinant (2006) for the study of multivariate extensionsof ARCH(∞) models. The introduction of FIGARCH models is due to Baillie, Bollerslev andMikkelsen (1996), but the existence of solutions was recently established by Douc, Roueff andSoulier (2008), where Corollary 2.6 is proven. LARCH(∞) models were introduced by Robinson(1991) and their probability properties studied by Giraitis, Robinson and Surgailis (2000), Giraitisand Surgailis (2002), Berkes and Horvath (2003a) and Giraitis et al. (2004). The estimation of suchmodels has been studied by Beran and Schutzner (2009), Truquet (2008) and Francq and Zakoıan(2009c).

The fourth-order moment structure and the autocovariances of the squares of GARCH processeswere analyzed by Milhøj (1984), Karanasos (1999) and He and Terasvirta (1999). The necessaryand sufficient condition for the existence of even-order moments was established by Ling andMcAleer (2002a), the sufficient part having been obtained by Chen and An (1998). Ling andMcAleer (2002b) derived an existence condition for the moment of order s, with s > 0, for a familyof GARCH processes including the standard model and the extensions presented in Chapter 10.The computation of the kurtosis coefficient for a general GARCH(p, q) model is due to Bai, Russeland Tiao (2004).

Several authors have studied the tail properties of the stationary distribution. See Mikosch andStarica (2000), Borkovec and Kluppelberg (2001), Basrak, Davis and Mikosch (2002) and Davisand Mikosch (2009).

Andersen and Bollerslev (1998) discussed the predictive qualities of GARCH, making a cleardistinction between the prediction of volatility and that of the squared returns (Exercise 2.21).

2.8 Exercises2.1 (Noncorrelation of εt with any function of its past?)

For a GARCH process does Cov(εt , f (εt−h)) = 0 hold for any function f and any h> 0?

2.2 (Strict stationarity of GARCH(1, 1) for two laws of ηt )In the GARCH(1, 1) case give an explicit strict stationarity condition when (i) the onlypossible values of ηt are −1 and 1; (ii) ηt follows a uniform distribution.

2.3 (Lyapunov coefficient of a constant sequence of matrices)Prove equality (2.21) for a diagonalizable matrix. Use the Jordan representation to extendthis result to any square matrix.

2.4 (Lyapunov coefficient of a sequence of matrices)Consider the sequence (At ) defined by At = ztA, where (zt ) is an ergodic sequence of realrandom variables such that E log+ |zt | <∞, and A is a square nonrandom matrix. Find theLyapunov coefficient γ of the sequence (At ) and give an explicit expression for the conditionγ < 0.

2.5 (Multiplicative norms)Show the results of footnote 6 on page 30.

Page 75: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 59

2.6 (Another vector representation of the GARCH(p, q) model)Verify that the vector z∗

t= (σ 2

t , . . . , σ2t−p+1, ε

2t−1, . . . , ε

2t−q+1)

′ ∈ Rp+q−1 allows us to define,for p ≥ 1 and q ≥ 2, a vector representation which is equivalent to those used in this chapter,of the form z∗

t= b∗t + A∗t z

∗t−1.

2.7 (Fourth-order moment of an ARCH(2) process)Show that for an ARCH(2) model, the condition for the existence of the moment of order 4,with µ4 = Eη4

t , is written as

α2 < 1 and µ4α21 <

1− α2

1+ α2(1− µ4α

22).

Compute this moment.

2.8 (Direct computation of the autocorrelations and autocovariances of the square of aGARCH(1, 1) process)Find the autocorrelation and autocovariance functions of (ε2

t ) when (εt ) is solution of theGARCH(1, 1) model {

εt = σtηtσ 2t = ω + αε2

t−1 + βσ 2t−1,

where (ηt ) ∼ N(0, 1) and 1− 3α2 − β2 − 2αβ > 0.

2.9 (Computation of the autocovariance of the square of a GARCH(1, 1) process by the generalmethod)Use the method of Section 2.5.3 to find the autocovariance function of (ε2

t ) when (εt ) issolution of a GARCH(1, 1) model. Compare with the method used in Exercise 2.8.

2.10 (Characteristic polynomial of EAt )Let A = EAt , where {At, t ∈ Z} is the sequence defined in (2.17).

1. Prove equality (2.38).

2. If∑q

i=1 αi +∑p

j=1 βj = 1, show that ρ(A) = 1.

2.11 (A condition for a sequence Xn to be o(n).)Let (Xn) be a sequence of identically distributed random variables, admitting an expectation.Show that

Xn

n→ 0 when n→∞

with probability 1. Prove that the convergence may fail if the expectation of Xn does notexist (an iid sequence with density f (x) = x−2 1lx≥1 may be considered).

2.12 (A case of dependent variables where the expectation of a product equals the product of theexpectations)Prove equality (2.31).

2.13 (Necessary condition for the existence of the moment of order 2s)Suppose that (εt ) is the strictly stationary solution of model (2.5) with Eε2s

t <∞, for s ∈(0, 1]. Let

z(K)t= bt +

K∑k=1

AtAt−1 . . . At−k+1bt−k. (2.66)

Page 76: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

60 GARCH MODELS

1. Show that when K →∞,

‖z(K)t− z(K−1)

t‖s → 0 a.s., E‖z(K)

t− z(K−1)

t‖s → 0.

2. Use this result to prove that E(‖AkAk−1 . . . A1b0‖s )→ 0 as k→∞.

3. Let (Xn) a sequence of ×m matrices and Y = (Y1, . . . , Ym)′ a vector which is indepen-

dent of (Xn) and such that for all i, 0 < E|Yi |s <∞. Show that, when n→∞,

E‖XnY‖s → 0 ⇒ E‖Xn‖s → 0.

4. Let A = EAt , b = Ebt and suppose there exists an integer N such that ANb> 0 (in thesense that all elements of this vector are strictly positive). Show that there exists k0 ≥ 1such that

E(‖Ak0Ak0−1 . . . A1‖s ) < 1.

5. Deduce (2.35) from the preceding question.

6. Is the condition α1 + β1 > 0 necessary?

2.14 (Positivity conditions)In the GARCH(1, q) case give a more explicit form for the conditions in (2.49). Show, bytaking q = 2, that these conditions are less restrictive than (2.20).

2.15 (A minoration for the first autocorrelations of the square of an ARCH)Let (εt ) be an ARCH(q) process admitting moments of order 4. Show that, for i = 1, . . . , q,

ρε2(i) ≥ αi .

2.16 (Asymptotic decrease of the autocorrelations of the square of an ARCH(2) process)Figure 2.12 shows that the first autocorrelations of the square of an ARCH(2) process,admitting moments of order 4, can be nondecreasing. Show that this sequence decreases aftera certain lag.

2.17 (Convergence in probability to −∞)If (Xn) and (Yn) are two independent sequences of random variables such that Xn + Yn →−∞ and Xn → −∞ in probability, then Yn →−∞ in probability.

2.18 (GARCH model with a random coefficient)Proceeding as in the proof of Theorem 2.1, study the stationarity of the GARCH(1, 1) modelwith random coefficient ω = ω(ηt−1),{

εt = σtηtσ 2t = ω(ηt−1)+ αε2

t−1 + βσ 2t−1,

(2.67)

under the usual assumptions and ω(ηt−1)> 0 a.s. Use the result of Exercise 2.17 to deal withthe case γ := E log a(ηt ) = 0.

2.19 (RiskMetrics model)The RiskMetrics model used to compute the value at risk (see Chapter 12) relies on thefollowing equations: {

εt = σtηt , (ηt ) iid N(0, 1)σ 2t = λσ 2

t−1 + (1− λ)ε2t−1

where 0 < λ < 1. Show that this model has no stationary and non trivial solution.

Page 77: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH(p, q) PROCESSES 61

2.20 (IARCH(∞) models: proof of Corollary 2.6)

1. In model (2.40), under the assumption A1 = 1, show that

∞∑i=1

φi logφi ≤ 0 and E(η20 log η2

0) ≥ 0.

2. Suppose that (2.41) holds. Show that the function f : [p, 1] �→ R defined by f (q) =log(Aqµ2q) is convex. Compute its derivative at 1 and deduce that (2.52) holds.

3. Establish the reciprocal and show that E|εt |q <∞ for any q ∈ [0, 2[.

2.21 (On the predictive power of GARCH)In order to evaluate the quality of the prediction of ε2

t obtained by using the volatility of aGARCH(1, 1) model, econometricians have considered the linear regression

ε2t = a + bσ 2

t + ut ,

where σ 2t is replaced by the volatility estimated from the model. They generally obtained

a very small determination coefficient, meaning that the quality of the regression was bad.It this surprising? In order to answer that question compute, under the assumption that Eε4

t

exists, the theoretical R2 defined by

R2 = Var(σ 2t )

Var(ε2t ).

Show, in particular, that R2 < 1/κη.

Page 78: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 79: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

3

Mixing*

It will be shown that, under mild conditions, GARCH processes are geometrically ergodic andβ-mixing. These properties entail the existence of laws of large numbers and of central limittheorems (see Appendix A), and thus play an important role in the statistical analysis of GARCHprocesses. This chapter relies on the Markov chain techniques set out, for example, by Meyn andTweedie (1996).

3.1 Markov Chains with Continuous State Space

Recall that for a Markov chain only the most recent past is of use in obtaining the conditionaldistribution. More precisely, (Xt ) is said to be a homogeneous Markov chain , evolving on a spaceE (called the state space) equipped with a σ -field E, if for all x ∈ E, and for all B ∈ E,

∀s, t ∈ N, P(Xs+t ∈ B | Xr, r < s;Xs = x) := P t(x, B). (3.1)

In this equation, P t(x, B) corresponds to the transition probability of moving from the state x

to the set B in t steps. The Markov property refers to the fact that P t (x, B) does not depend onXr, r < s. The fact that this probability does not depend on s is referred to time homogeneity . Forsimplicity we write P(x, B) = P 1(x, B). The function P : E × E→ [0, 1] is called a transitionkernel and satisfies:

(i) ∀B ∈ E, the function P(·, B) is measurable;

(ii) ∀x ∈ E, the function P(x, ·) is a probability measure on (E, E ).The law of the process (Xt ) is characterized by an initial probability measure µ and a transition

kernel P . For all integers t and all (t + 1)-tuples (B0, . . . , Bt ) of elements of E, we set

Pµ (X0 ∈ B0, . . . , Xt ∈ Bt )

=∫x0∈B0

. . .

∫xt−1∈Bt−1

µ(dx0)P (x0, dx1) . . . P (xt−1, Bt ). (3.2)

In what follows, (Xt ) denotes a Markov chain on E = Rd and E is the Borel σ -field.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 80: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

64 GARCH MODELS

Irreducibility and Recurrence

The Markov chain (Xt) is said to be φ-irreducible for a nontrivial (that is, not identically equalto zero) measure φ on (E, E), if

∀B ∈ E, φ(B)> 0 ⇒ ∀x ∈ E, ∃t > 0, P t (x, B)> 0.

If (Xt ) is φ-irreducible, it can be shown that there exists a maximal irreducibility measure, that is,an irreducibility measure M such that all the other irreducibility measures are absolutely continuouswith respect to M . If M(B) = 0 then the set of points from which B is accessible is also of zeromeasure (see Meyn and Tweedie, 1996, Proposition 4.2.2). Such a measure M is not unique, butthe set

E+ = {B ∈ E | M(B)> 0 }

does not depend on the maximal irreducibility measure M . For a particular model, finding ameasure that makes the chain irreducible may be a nontrivial problem (but see Exercise 3.1 for anexample of a time series model for which the determination of such a measure is very simple).

A φ-irreducible chain is called recurrent if

U(x,B) :=∞∑t=1

P t (x, B) = +∞, ∀x ∈ E, ∀B ∈ E+,

and is called transient if

∃(Bj )j , E =⋃j

Bj , U(x, Bj ) ≤ Mj <∞, ∀x ∈ E.

Note that U(x, B) = E∑∞

t=1 1lB(Xt) can be interpreted as the average time that the chain spendsin B when it starts at x. It can be shown that a φ-irreducible chain (Xt) is either recurrent ortransient (see Meyn and Tweedie, 1996, Theorem 8.3.4). It is said that (Xt ) is positive recurrent if

lim supt→∞

P t (x, B)> 0, ∀x ∈ E,∀B ∈ E+.

If a φ-irreducible chain is not positive recurrent, it is called null recurrent . For a φ-irreduciblechain, positive recurrence is equivalent to the existence of a (unique) invariant probability measure(see Meyn and Tweedie, 1996, Theorem 18.2.2), that is, a probability π such that

∀B ∈ E, π(B) =∫P(x,B)π(dx).

An important consequence of this equivalence is that, for Markov time series, the issue of findingstrict stationarity conditions reduces to that of finding conditions for positive recurrence. Indeed,it can be shown (see Exercise 3.2) that for any chain (Xt ) with initial measure µ,

(Xt ) is stationary ⇔ µ is invariant. (3.3)

For this reason, the invariant probability is also called the stationary probability .

Page 81: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 65

Small Sets and Aperiodicity

For a φ-irreducible chain, there exists a class of sets enjoying properties that are similar to thoseof the elementary states of a finite state space Markov chain. A set C ∈ E is called a small set1 ifthere exist an integer m ≥ 1 and a nontrivial measure ν on E such that

∀x ∈ C, ∀B ∈ E, Pm(x, B) ≥ ν(B).

In the AR(1) case, for instance, it is easy to find small sets (see Exercise 3.4). For more sophisti-cated models, the definition is not sufficient and more explicit criteria are needed. For the so-calledFeller chains, we will see below that it is very easy to find small sets. For a general chain, wehave the following criterion (see Nummelin, 1984, Proposition 2.11): C ∈ E+ is a small set if thereexists A ∈ E+ such that, for all B ⊂ A, B ∈ E+, there exists T > 0 such that

infx∈C

T∑t=0

P t (x, B)> 0.

If the chain is φ-irreducible, it can be shown that there exists a countable cover of E by smallsets. Moreover, each set B ∈ E+ contains a small set C ∈ E+. The existence of small sets allowsus to define cycles for φ-irreducible Markov chains with general state space, as in the case ofcountable space chains. More precisely, the period is the greatest common divisor (gcd) of the set

{n ≥ 1 | ∀x ∈ C, ∀B ∈ E, P n(x, B) ≥ δnν(B), for some δn > 0},

where C ∈ E+ is any small set (the gcd is independent of the choice of C). When d = 1, the chainis said to be aperiodic. Moreover, it can be shown (see Meyn and Tweedie, 1996, Theorem 5.4.4.)that there exist disjoint sets D1, . . . , Dd ∈ E such that (with the convention Dd+1 = D1):

(i) ∀i = 1, . . . , d,∀x ∈ Di, P (x,Di+1) = 1;

(ii) φ(E −⋃Di) = 0.

A necessary and sufficient condition for the aperiodicity of (Xt ) is that there exists A ∈ E+ suchthat for all B ⊂ A, B ∈ E+, there exists t > 0 such that

P t(x, B)> 0 and P t+1(x, B)> 0 ∀x ∈ B (3.4)

(see Chan, 1990, Proposition A1.2).

Geometric Ergodicity and Mixing

In this section, we study the convergence of the probability Pµ(Xt ∈ ·) to a probability π(·)independent of the initial probability µ, as t →∞.

It is easy to see that if there exists a probability measure π such that, for an initial measure µ,

∀B ∈ E, Pµ(Xt ∈ B)→ π(B), when t →+∞, (3.5)

where Pµ(Xt ∈ B) is defined in (3.2) (for (B0, . . . , Bt ) = (E, . . . , E,B)), then the probability π

is invariant (see Exercise 3.3). Note also that (3.5) holds for any measure µ if and only if

∀B ∈ E, ∀x ∈ E, P t (x, B)→ π(B), when t →+∞.

1 Meyn and Tweedie (1996) introduce a more general notion, called a ‘petite set’, obtained by replac-ing, in the definition, the transition probability in m steps by an average of the transition probabilities,∑∞

m=0 amPm(x, B), where (am) is a probability distribution.

Page 82: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

66 GARCH MODELS

On the other hand, if the chain is irreducible, aperiodic and admits an invariant probability π , forπ-almost all x ∈ E,

‖ P t(x, .)− π ‖→ 0 when t →+∞, (3.6)

where ‖ · ‖ denotes the total variation norm2 (see Meyn and Tweedie, 1996, Theorem 14.0.1).A chain (Xt ) such that the convergence (3.6) holds for all x is said to be ergodic. However, thisconvergence is not sufficient for mixing. We will define a stronger notion of ergodicity.

The chain (Xt ) is called geometrically ergodic if there exists ρ ∈ (0, 1) such that

∀x ∈ E, ρ−t ‖ P t (x, .)− π ‖→ 0 when t →+∞. (3.7)

Geometric ergodicity entails the so-called α- and β-mixing. The general definition of the α- andβ-mixing coefficients is given in Appendix A.3.1. For a stationary Markov process, the definitionof the α-mixing coefficient reduces to

αX(k) = supf,g

|Cov(f (X0), g(Xk))|,

where the first supremum is taken over the set of the measurable functions f and g such that|f | ≤ 1, |g| ≤ 1 (see Bradley, 1986, 2005). A general process X = (Xt ) is said to be α-mixing(β-mixing) if αX(k) (βX(k)) converges to 0 as k→∞. Intuitively, these mixing properties charac-terize the decrease in dependence when past and future become sufficiently far apart. The α-mixingis sometimes called strong mixing, but β-mixing entails strong mixing because αX(k) ≤ βX(k) (seeAppendix A.3.1).

Davydov (1973) showed that for an ergodic Markov chain (Xt), of invariant probabilitymeasure π ,

βX(k) =∫‖ P k(x, .)− π ‖ π(dx).

It follows that βX(k) = O(ρk) if the convergence (3.7) holds. Thus

(Xt) is stationary and geometrically ergodic

⇒ (Xt) is geometrically β-mixing. (3.8)

Two Ergodicity Criteria

For particular models, it is generally not easy to directly verify the properties of recurrence,existence of an invariant probability law, and geometric ergodicity. Fortunately, there exist simplecriteria on the transition kernel.

We begin by defining the notion of Feller chain. The Markov chain (Xt ) is said to be aFeller chain if, for all bounded continuous functions g defined on E, the function of x defined byE(g(Xt )|Xt−1 = x) is continuous. For instance, for an AR(1) we have, with obvious notation,

E{g(Xt) | Xt−1 = x} = E{g(θx + εt )}.

The continuity of the function x → g(θx + y) for all y, and its boundedness, ensure, by theLebesgue dominated convergence theorem, that (Xt ) is a Feller chain. For a Feller chain, thecompact sets C ∈ E+ are small sets (see Feigin and Tweedie, 1985).

The following theorem provides an effective way to show the geometric ergodicity (and thusthe β-mixing) of numerous Markov processes.

2 The total variation norm of a (signed) measure m is defined by ‖m‖ = sup∫f dm, where the supremum

is taken over {f : E −→ R, f measurable and |f | ≤ 1}.

Page 83: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 67

Theorem 3.1 (Feigin and Tweedie (1985, Theorem 1)) Assume that:

(i) (Xt) is a Feller chain;

(ii) (Xt) is φ-irreducible;

(iii) there exist a compact set A ⊂ E such that φ(A)> 0 and a continuous function V : E→R+ satisfying

V (x) ≥ 1, ∀x ∈ A, (3.9)

and for δ > 0,

E {V (Xt )|Xt−1 = x} ≤ (1− δ)V (x), ∀x /∈ A. (3.10)

Then (Xt) is geometrically ergodic.

This theorem will be applied to GARCH processes in the next section (see also Exercise 3.5for a bilinear example). In equation (3.10), V can be interpreted as an energy function. When thechain is outside the center A of the state space, the energy dissipates, on average. When the chainlies inside A, the energy is bounded, by the compactness of A and the continuity of V . SometimesV is called a test function and (iii) is said to be a drift criterion.

Let us explain why these assumptions imply the existence of an invariant probability measure.For simplicity, assume that the test function V takes its values in [1,+∞), which will be thecase for the applications to GARCH models we will present in the next section. Denote by Pthe operator which, to a measurable function f in E, associates the function Pf defined by

∀x ∈ E, Pf (x) =∫E

f (y)P (x, dy) = E {f (Xt ) | Xt−1 = x} .

Let P t be the t th iteration of P, obtained by replacing P(x, dy) by P t (x, dy) in the previous inte-gral. By convention P0f = f and P 0(x, A) = 1lA. Equations (3.9) and (3.10) and the boundednessof V by some M> 0 on A yield an inequality of the form

PV ≤ (1− δ)V + b 1lA

where b = M − (1− δ)., Iterating this relation t times, we obtain, for x0 ∈ A

∀t ≥ 0, Pt+1V (x0) ≤ (1− δ)Pt V (x0)+ bP t(x0, A). (3.11)

It follows (see Exercise 3.6) that there exists a constant κ > 0 such that for n large enough,

Qn(x0, A) ≥ κ where Qn(x0, A) = 1

n

n∑t=1

P t (x0, A). (3.12)

The sequence Qn(x0, ·) being a sequence of probabilities on (E, E), it admits an accumulationpoint for vague convergence: there exist a measure π of mass less than 1 and a subsequence (nk)such that for all continuous functions f with compact support,

limk→∞

∫E

f (y)Qnk (x0, dy) =∫E

f (y)π(dy). (3.13)

In particular, if we take f = 1lA in this equality, we obtain π(A) ≥ κ , thus π is not equal tozero. Finally, it can be shown that π is a probability and that (3.13) entails that π is an invariantprobability for the chain (Xt ) (see Exercise 3.7).

Page 84: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

68 GARCH MODELS

For some models, the drift criterion (iii) is too restrictive because it relies on transitions inonly one step. The following criterion, adapted from Meyn and Tweedie (1996, Theorems 19.1.3,6.2.9 and 6.2.5), is an interesting alternative relying on the transitions in n steps.

Theorem 3.2 (Geometric ergodicity criterion) Assume that:

(i) (Xt ) is an aperiodic Feller chain;

(ii) (Xt ) is φ-irreducible where the support of φ has nonempty interior;

(iii) there exist a compact C ⊂ E, an integer n ≥ 1 and a continuous function V : E→ R+satisfying

1 ≤ V (x), ∀x ∈ C, (3.14)

and for δ > 0 and b> 0,

E {V (Xt+n)|Xt−1 = x} ≤ (1− δ)V (x), ∀x /∈ C, (3.15)

E {V (Xt+n)|Xt−1 = x} ≤ b, ∀x ∈ C.

Then (Xt ) is geometrically ergodic.

The compact C of condition (iii) can be replaced by a small set but the function V must bebounded on C. When (Xt ) is not a Feller chain, a similar criterion exists, for which it is necessaryto consider such small sets (see Meyn and Tweedie, 1996, Theorem 19.1.3).

3.2 Mixing Properties of GARCH Processes

We begin with the ARCH(1) process because this is the only case where the process (εt ) isMarkovian.

The ARCH(1) Case

Consider the model {εt = σtηtσ 2t = ω + αε2

t−1,(3.16)

where ω> 0, α ≥ 0 and (ηt ) is a sequence of iid (0, 1) variables. The following theorem establishesthe mixing property of the ARCH(1) process under the necessary and sufficient strict stationaritycondition (see Theorem 2.1 and (2.10)). An extra assumption on the distribution of ηt is required,but this assumption is mild:

Assumption A The law Pη of the process (ηt ) is absolutely continuous, of density f with respectto the Lebesgue measure λ on (R,B(R)). We assume that

inf{η | η> 0, f (η)> 0} = inf{−η | η < 0, f (η)> 0} := η0, (3.17)

and that there exists τ > 0 such that(−η0 − τ,−η0) ∪ (η0, η0 + τ

) ⊂ {f > 0}.

Note that this assumption includes, in particular, the standard case where f is positive over aneighborhood of 0, possibly over all R. We then have η0 = 0. Equality (3.17) implies some (local)

Page 85: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 69

symmetry of the law of (ηt ). This symmetry facilitates the proof of the following theorem, but itcan be omitted (see Exercise 3.8).

Theorem 3.3 (Mixing of the ARCH(1) model) Under Assumption A and for

α < e−E log η2t , (3.18)

the nonanticipative strictly stationary solution of the ARCH(1) model (3.16) is geometricallyergodic, and thus geometrically β-mixing.

Proof. Let ψ(x) = (ω + αx2)1/2. A process (εt ) satisfying

εt = ψ(εt−1)ηt , t ≥ 1,

where ηt is independent of εt−i , i > 0, is clearly a homogenous Markov chain on (R,B(R)), withtransition probabilities

P(x, B) = P(ε1 ∈ B | ε0 = x) =∫

1ψ(x)

B

dPη(y).

We will show that the conditions of Theorem 3.1 are satisfied.

Step (i). We haveE {g(εt ) | εt−1 = x} = E

[g{ψ(x)ηt }

].

If g is continuous and bounded, the same is true for the function x → g{ψ(x)y}, for all y. By theLebesgue theorem, it follows that (εt ) is a Feller chain.

Step (ii). To show the φ-irreducibility of the chain, for some measure φ, assume for the momentthat η0 = 0 in Assumption A. Suppose, for instance, that f is positive on [0, τ ). Let φ be therestriction of the Lebesgue measure to the interval [0,

√ωτ). Since ψ(x) ≥ √ω, it can be seen

that

φ(B)> 0 ⇒ λ

{1

ψ(x)B ∩ [0, τ )

}> 0 ⇒ P(x, B)> 0.

It follows that the chain (εt ) is φ-irreducible. In particular, φ = λ if ηt has a positive densityover R.

The proof of the irreducibility in the case η0 > 0 is more difficult. First note that

E logαη2t =

∫(−∞,−η0] ∪ [η0,+∞)

log(αx2)f (x)dλ(x) ≥ logα(η0)2.

Now E logαη2t < 0 by (3.18). Thus we have

ρ := α(η0)2 < 1.

Let τ ′ ∈ (0, τ ) be small enough such that

ρ1 := α(η0 + τ ′)2 < 1.

Iterating the model, we obtain that, for ε0 = x fixed,

ε2t = ω(η2

t + αη2t η

2t−1 + · · · + αt−1η2

t . . . η21)+ αtη2

t . . . η21x

2.

Page 86: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

70 GARCH MODELS

It follows that the function

Yt = (η21, . . . , η

2t )→ Zt = (η2

1, . . . , η2t−1, ε

2t )

is a diffeomorphism between open subsets of Rt . Moreover, in view of Assumption A, the vectorYt has a density on Rt . The same is thus true for Zt , and it follows that, given ε0 = x,

the variable ε2t has a density with respect to λ. (3.19)

We now introduce the event

�t =t⋂

s=1

{ηs ∈

(−η0 − τ ′,−η0) ∪ [η0, η0 + τ ′

]}. (3.20)

Assumption A implies that P(�t )> 0. Conditional on �t , we have

ε2t ∈ It :=

[ω(η0)2 1− ρt

1− ρ+ ρtx2, ω(η0)2

1− ρt11− ρ1

+ ρt1x2].

Since the bounds of the interval It are reached, the intermediate value theorem and (3.19) entailthat, given ε0 = x, ε2

t has, conditionally on �t , a positive density on It . It follows that

εt has, conditionally on �t, a positive density on Jt (3.21)

where Jt = {x ∈ R | x2 ∈ It }. Let

I =[ω(η0)2

1− ρ,ω(η0 + τ ′)2

1− ρ1

], J = {x ∈ R | x2 ∈ I }

and let λJ be the restriction of the Lebesgue measure to J. We have

λJ (B)> 0 ⇒ ∃t, λ(B ∩ Jt )> 0

⇒ ∃t, P(εt ∈ B|ε0 = x) ≥ P(εt ∈ B | (ε0 = x) ∩�t)P(�t )> 0.

The chain (εt ) is thus φ-irreducible with φ = λJ .

Step (iii). We shall use Lemma 2.2. The variable αη2t is almost surely positive and satisfies

E(αη2t ) = α <∞ and E logαη2

t < 0, in view of assumption (3.18). Thus, there exists s > 0such that

c := αsµ2s < 1,

where µ2s = Eη2st . The proof of Lemma 2.2 shows that we can assume s ≤ 1. Let V (x) = 1+ x2s .

Condition (3.9) is obviously satisfied for all x. Let 0 < δ < 1− c and let the compact set

A = {x ∈ R;ωsµ2s + δ + (c − 1+ δ)x2s ≥ 0}.Since A is a nonempty closed interval with center 0, we have φ(A)> 0. Moreover, by the inequality(a + b)s ≤ as + bs for a, b ≥ 0 and s ∈ [0, 1] (see the proof of Corollary 2.3), we have, for x /∈ A,

E[V (εt )|εt−1 = x] ≤ 1+ (ωs + αsx2s )µ2s

= 1+ ωsµ2s + cx2s

< (1− δ)V (x),

which proves (3.10). It follows that the chain (εt ) is geometrically ergodic. Therefore, in view of(3.8), the chain obtained with the invariant law as initial measure is geometrically β-mixing. Theproof of the theorem is complete.

Page 87: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 71

Remark 3.1 (Case where the law of ηt does not have a density) The condition on the densityof ηt is not necessary for the mixing property. Suppose, for example, that η2

t = 1, a.s. (that is,ηt takes the values −1 and 1, with probability 1/2). The strict stationarity condition reduces toα < 1 and the strictly stationary solution is εt =

√ω

1−α ηt , a.s. This solution is mixing since it isan independent white noise.

Another pathological example is obtained when ηt has a mass at 0: P(ηt = 0) = θ > 0. Regard-less of the value of α, the process is strictly stationary because the right-hand side of inequality(3.18) is equal to +∞. A noticeable feature of this chain is the existence of regeneration timesat which the past is forgotten. Indeed, if ηt = 0 then εt = 0, εt+1 = √ωηt+1, . . . . It is easy to seethat the process is then mixing, regardless of α.

The GARCH(1, 1) Case

Let us consider the GARCH(1, 1) model{εt = σtηtσ 2t = ω + αε2

t−1 + βσ 2t−1,

(3.22)

where ω> 0, α ≥ 0, β ≥ 0 and the sequence (ηt ) is as in the previous section. In this case (σt ) isMarkovian, but (εt ) is not Markovian when β > 0. The following result extends Theorem 3.3.

Theorem 3.4 (Mixing of the GARCH(1, 1) model) Under Assumption A and if

E log(αη2t + β) < 0, (3.23)

then the nonanticipative strictly stationary solution of the GARCH(1, 1) model (3.22) is such thatthe Markov chain (σt ) is geometrically ergodic and the process (εt ) is geometrically β-mixing.

Proof. If α = 0 the strictly stationary solution is iid, and the conclusion of the theorem followsin this case. We now assume that α > 0. We first show the conclusions of the theorem that con-cern the process (σt ). A homogenous Markov chain (σt ) is defined on (R+,B(R+)) by setting,for t ≥ 1,

σ 2t = ω + a(ηt−1)σ

2t−1, (3.24)

where a(x) = αx2 + β. Its transition probabilities are given by

∀x > 0, ∀B ∈ B(R+), P (x, B) = P(σ1 ∈ B | σ0 = x) =∫Bx

dPη(y),

where Bx = {η; {ω + a(η)x2}1/2 ∈ B}. We show the stated results by checking the conditions ofTheorem 3.1.

Step (i). The arguments given in the ARCH(1) case, with

E {g(σt ) | σt−1 = x} = E[g{(ω + a(ηt )x

2)1/2}] ,are sufficient to show that (σt ) is a Feller chain.

Page 88: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

72 GARCH MODELS

Step (ii). To show the irreducibility, note that (3.23) implies

ρ := a(η0) < 1,

since |ηt | ≥ η0 a.s. and a(·) is an increasing function. Let τ ′ ∈ (0, τ ) be small enough such that

ρ1 := a(η0 + τ ′) < 1.

If σ0 = x ∈ R+, we have, for t > 0,

σ 2t = ω[1+ a(ηt−1)+ a(ηt−1)a(ηt−2)+ · · · + a(ηt−1) . . . a(η1)]+ a(ηt−1) . . . a(η0)x

2.

Conditionally on �t , defined in (3.20), we have

σ 2t ∈ It =

1− ρ+ ρt

(x2 − ω

1− ρ

),

ω

1− ρ1+ ρt1

(x2 − ω

1− ρ1

)].

Let

I = limt→∞It =

1− ρ,

ω

1− ρ1

].

Then, given σ0 = x,

σt has, conditionally on �t , a positive density on Jt

where Jt = {x ∈ R+ | x2 ∈ It }. Let λJ be the restriction of the Lebesgue measure toJ =

[√ω

1−ρ ,√

ω1−ρ1

]. We have

λJ (B)> 0 ⇒ ∃t, λ(B ∩ Jt )> 0

⇒ ∃t, P(σt ∈ B|σ0 = x) ≥ P(σt ∈ B | (σ0 = x) ∩�t)P(�t )> 0.

The chain (σt ) is thus φ-irreducible with φ = λJ .

Step (iii). We again use Lemma 2.2. By the arguments used in the ARCH(1) case, there existss ∈ [0, 1] such that

c1 := E{a(ηt−1)s} < 1.

Define the test function by V (x) = 1+ x2s , let 0 < δ < 1− c1 and let the compact set

A = {x ∈ R+; ωs + δ + (c1 − 1+ δ)x2s ≥ 0}.We have, for x /∈ A,

E[V (σt )|σt−1 = x] ≤ 1+ ωs + c1x2s

< (1− δ)V (x),

which proves (3.10). Moreover (3.9) is satisfied.To be able to apply Theorem 3.1, it remains to show that φ(A)> 0 where φ is the above

irreducibility measure. In view of the form of the intervals I and A, it is clear that, denoting by◦A the interior of A,

φ(A)> 0 ⇔√

ω

1− ρ∈ ◦A

⇔ ωs + δ + (c1 − 1+ δ)

1− ρ

)s> 0.

Page 89: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 73

Therefore, it suffices to choose δ sufficiently close to 1− c1 so that the last inequality is satisfied.For such a choice of δ, the compact set A satisfies the assumptions of Theorem 3.1. Consequently,the chain (σt ) is geometrically ergodic. Therefore the nonanticipative strictly stationary solution(σt ), satisfying (3.24) for t ∈ Z, is geometrically β-mixing.

Step (iv). Finally, we show that the process (εt ) inherits the mixing properties of (σt ). Sinceεt = σtηt , it is sufficient to show that the process Yt = (σt , ηt )

′ enjoys the mixing property. Itis clear that (Yt ) is a Markov chain on R+ × R equipped with the Borel σ -field. Moreover,(Yt ) is strictly stationary because, under condition (3.23), the strictly stationary solution (σt ) isnonanticipative, thus Yt is a function of ηt , ηt−1, . . . . Moreover, σt is independent of ηt . Thus thestationary law of (Yt ) can be denoted by PY = Pσ ⊗ Pη where Pσ denotes the law of σt and Pη thatof ηt . Let P t (y, ·) the transition probabilities of the chain (Yt ). We have, for y = (y1, y2) ∈ R+ × R,B1 ∈ B(R+), B2 ∈ B(R) and t > 0,

P t (y, B1 × B2) = P(σt ∈ B1, ηt ∈ B2 | σ0 = y1, η0 = y2)

= Pη(B2)P(σt ∈ B1 | σ0 = y1, η0 = y2)

= Pη(B2)P(σt ∈ B1 | σ1 = ω + a(y2)y1)

= Pη(B2)Pt−1(ω + a(y2)y1, B1).

It follows, since Pη is a probability, that

‖P t (y, ·)− PY (·)‖ = ‖P t−1(ω + a(y2)y1, ·)− Pσ (·)‖.

The right-hand side converges to 0 at exponential rate, in view of the geometric ergodicity of (σt ).It follows that (Yt ) is geometrically ergodic and thus β-mixing. The process (εt ) is also β-mixing,since εt is a measurable function of Yt .

Theorem 3.4 is of interest because it provides a proof of strict stationarity which is completelydifferent from that of Theorem 2.8. A slightly more restrictive assumption on the law of ηt hasbeen required, but the result obtained in Theorem 3.4 is stronger.

The ARCH(q) Case

The approach developed in the case q = 1 does not extend trivially to the general case because(εt ) and (σt ) lose their Markov property when p> 1 or q > 1. Consider the model{

εt = σtηtσ 2t = ω +∑q

i=1 αiε2t−i ,

(3.25)

where ω> 0, αi ≥ 0, i = 1, . . . , q, and (ηt ) is defined as in the previous section. We will onceagain use the Markov representation

zt= bt + Atzt−1, (3.26)

where

At =(

α1: q−1η2t αqη

2t

Iq−1 0

), bt = (ωη2

t , 0, . . . , 0)′, zt= (ε2

t , . . . , ε2t−q+1)

′.

Recall that γ denotes the top Lyapunov exponent of the sequence {At, t ∈ Z}.

Page 90: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

74 GARCH MODELS

Theorem 3.5 (Mixing of the ARCH(q) model) If ηt has a positive density on a neighborhoodof 0 and γ < 0, then the nonanticipative strictly stationary solution of the ARCH(q) model (3.25)is geometrically β-mixing.

Proof. We begin by showing that the nonanticipative and strictly stationary solution (zt) of the

model (3.26) is mixing. We will use Theorem 3.2 because a one-step drift criterion is not sufficient.Using (3.26) and the independence between ηt and the past of z

t, it can be seen that the process

(zt) is a Markov chain on (R+)q equipped with the Borel σ -field, with transition probabilities

∀x ∈ (R+)q, ∀B ∈ B((R+)q), P (x, B) = P(b1 + A1x ∈ B).The Feller property of the chain (z

t) is obtained by the arguments employed in the ARCH(1)

and GARCH(1, 1) cases, relying on the independence between ηt and the past of zt, as well as on

the continuity of the function x → bt + Atx.In order to establish the irreducibility, let us consider the transitions in q steps. Starting from

z0 = x, after q transitions the chain reaches a state zq

of the form

zq=

η2qψq(η

2q−1, . . . , η

21, x)

...

η21ψ1(x)

,

where the functions ψi are such that ψi(·) ≥ ω> 0. Let τ > 0 be such that the density f of ηt bepositive on (−τ, τ ), and let φ be the restriction to [0, ωτ 2[q of the Lebesgue measure λ on Rq .

It follows that, for all B = B1 × · · · × Bq ∈ B((R+)q ), φ(B)> 0 implies that, for allx, y1, . . . , yq ∈ (R+)q , and for all i = 1, . . . , q,

λ

{1

ψi(y1, . . . , yi−1, x)Bi ∩ [0, τ 2[

}> 0,

which implies in turn that, for all x ∈ (R+)q , Pq(x, B)> 0. We conclude that the chain (zt) is

φ-irreducible.The same argument shows that

φ(B)> 0 ⇒ ∀k > 0,∀x ∈ (R+)q, P q+k(x, B)> 0.

The criterion given in (3.4) can then be checked, which implies that the chain is aperiodic.We now show that condition (iii) of Theorem 3.2 is satisfied with the test function

V (x) = 1+ ‖x‖s ,where ‖ · ‖ denotes the norm ‖A‖ =∑ |Aij | of a matrix A = (Aij ) and s ∈ (0, 1) is such that

ρ := E(‖Ak0Ak0−1 . . . A1‖s ) < 1

for some integer k0 ≥ 1. The existence of s and k0 is guaranteed by Lemma 2.3. Iterating (3.26),we have

zk0= bk0

+k0−2∑k=0

Ak0 . . . Ak0−kbk0−k−1 + Ak0 . . . A1z0.

The norm being multiplicative, it follows that

‖zk0‖s ≤ ‖bk0

‖s +k0−2∑k=0

‖Ak0 . . . Ak0−k‖s‖bk0−k−1‖s + ‖Ak0 . . . A1‖s‖z0‖s .

Page 91: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 75

Thus, for all x ∈ (R+)p+q ,

E(V (zk0) | z0 = x) ≤ 1+ E‖bk0

‖s +k0−2∑k=0

E‖Ak0 . . . Ak0−k‖sE‖bk0−k−1‖s + ρ‖x‖s

:= K + ρ‖x‖s .

The inequality comes from the independence between At and bt−i for i > 0. The existence of theexpectations on the right-hand side of the inequality comes from arguments used to show (2.33).Let δ > 0 such that 1− δ >ρ and let C be the subset of (R+)p+q defined by

C = {x | (1− δ − ρ)‖x‖s ≤ K − (1− δ)}.

We have C = ∅ because K > 1− δ. Moreover, C is compact because 1− δ − ρ > 0. Condition(3.14) is clearly satisfied, V being greater than 1. Moreover, (3.15) also holds true for n = k0 − 1.We conclude that, in view of Theorem 3.2, the chain z

tis geometrically ergodic and, when it is

initialized with the stationary measure, the chain is stationary and β-mixing.Consequently, the process (ε2

t ), where (εt ) is the nonanticipative strictly stationary solutionof model (3.25), is β-mixing, as a measurable function of z

t. This argument is not sufficient to

conclude concerning (εt ). For k > 0, let

Y0 = f (. . . , ε−1, ε0), Zk = g(εk, εk+1, . . .),

where f and g are measurable functions. Note that

E(Y0 | ε2t , t ∈ Z) = E(Y0 | ε2

t , t ≤ 0, η2u, u ≥ 1) = E(Y0 | ε2

t , t ≤ 0).

Similarly, we have E(Zk | ε2t , t ∈ Z) = E(Zk | ε2

t , t ≥ k), and we have independence between Y0

and Zk conditionally on (ε2t ). Thus, we obtain

Cov(Y0, Zk) = E(Y0Zk)−E(Y0)E(Zk)

= E{E(Y0Zk | ε2t , t ∈ Z)}

−E{E(Y0 | ε2t , t ∈ Z)} E{E(Zk | ε2

t , t ∈ Z)}= E{E(Y0 | ε2

t , t ≤ 0)E(Zk | ε2t , t ≥ k)}

−E{E(Y0 | ε2t , t ≤ 0)} E{E(Zk | ε2

t , t ≥ k)}= Cov{E(Y0 | ε2

t , t ≤ 0), E(Zk | ε2t , t ≥ k)}

:= Cov{f1(. . . , ε2−1, ε

20), g1(ε

2k , ε

2k+1, . . .)}.

It follows, in view of the definition (A.5) of the strong mixing coefficients, that

αε(k) ≤ αε2(k).

In view of (A.6), we also have βε(k) ≤ βε2(k). Actually, (A.7) entails that the converse inequalitiesare always true, so we have αε2(k) = αε(k) and βε2(k) = βε(k). The theorem is thus shown. �

Page 92: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

76 GARCH MODELS

3.3 Bibliographical Notes

A major reference on ergodicity and mixing of general Markov chains is Meyn and Tweedie (1996).For a less comprehensive presentation, see Chan (1990), Tjøstheim (1990) and Tweedie (1998).For survey papers on mixing conditions, see Bradley (1986, 2005). We also mention the book byDoukhan (1994) which proposes definitions and examples of other types of mixing, as well asnumerous limit theorems.

For vectorial representations of the form (3.26), the Feller, aperiodicity and irreducibility prop-erties were established by Cline and Pu (1998, Theorem 2.2), under assumptions on the errordistribution and on the regularity of the transitions.

The geometric ergodicity and mixing properties of the GARCH(p, q) processes were estab-lished in the PhD thesis of Boussama (1998), using results of Mokkadem (1990) on polynomialprocesses. The proofs use concepts of algebraic geometry to determine a subspace of the stateson which the chain is irreducible. For the GARCH(1, 1) and ARCH(q) models we did not needsuch sophisticated notions. The proofs given here are close to those given in Francq and Zakoıan(2006a), which considers more general GARCH(1, 1) models. Mixing properties were obtained byCarrasco and Chen (2002) for various GARCH-type models under stronger conditions than the strictstationarity (for example, α + β < 1 for a standard GARCH(1, 1); see their Table 1). Recently,Meitz and Saikkonen (2008a, 2008b) showed mixing properties under mild moment assumptionsfor a general class of first-order Markov models, and applied their results to the GARCH(1, 1).

The mixing properties of ARCH(∞) models are studied by Fryzlewicz and Subba Rao (2009).They develop a method for establishing geometric ergodicity which, contrary to the approach ofthis chapter, does not rely on the Markov chain theory. Other approaches, for instance developedby Ango Nze and Doukhan (2004) and Hormann (2008), aim to establish probability properties(different from mixing) of GARCH-type sequences, which can be used to establish central limittheorems.

3.4 Exercises

3.1 (Irreducibility condition for an AR(1) process)Given a sequence (εt )t∈N of iid centered variables of law Pε which is absolutely continuouswith respect to the Lebesgue measure λ on R, let (Xt)t∈N be the AR(1) process defined by

Xt = θXt−1 + εt , t ≥ 0

where θ ∈ R.

(a) Show that if Pε has a positive density over R, then (Xt) constitutes a λ-irreducible chain.

(b) Show that if the density of εt is not positive over all R, the existence of an irreducibilitymeasure is not guaranteed.

3.2 (Equivalence between stationarity and invariance of the initial measure)Show the equivalence (3.3).

3.3 (Invariance of the limit law)Show that if π is a probability such that for all B, Pµ(Xt ∈ B)→ π(B) when t →∞, thenπ is invariant.

3.4 (Small sets for AR(1))For the AR(1) model of Exercise 3.1, show directly that if the density f of the error term ispositive everywhere, then the compacts of the form [−c, c], c > 0, are small sets.

Page 93: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MIXING 77

3.5 (From Feigin and Tweedie, 1985)For the bilinear model

Xt = θXt−1 + bεtXt−1 + εt , t ≥ 0,

where (εt ) is as in Exercise 3.1(a), show that if

E|θ + bεt | < 1,

then there exists a unique strictly stationary solution and this solution is geometrically ergodic.

3.6 (Lower bound for the empirical mean of the P t(x0, A))Show inequality (3.12).

3.7 (Invariant probability)Show the invariance of the probability π satisfying (3.13).Hints : (i) For a function g which is continuous and positive (but not necessarily with compactsupport), this equality becomes

lim infk→∞

∫E

g(y)Qnk (x0, dy) ≥∫E

g(y)π(dy)

(see Meyn and Tweedie, 1996, Lemma D.5.5).(ii) For all σ -finite measures µ on (R,B(R)) we have

∀B ∈ B(R), µ(B) = sup{µ(C); C ⊂ B, C compact}

(see Meyn and Tweedie, 1996, Theorem D.3.2).

3.8 (Mixing of the ARCH(1) model for an asymmetric density)Show that Theorem 3.3 remains true when Assumption A is replaced by the following:The law Pη is absolutely continuous, with density f, with respect to λ. There exists τ > 0such that (

η0− − τ, η0

−) ∪ (

η0+, η

0+ + τ

) ⊂ {f > 0},

where η0− = sup{η | η < 0, f (η)> 0} and η0

+ = inf{η | η> 0, f (η)> 0}.3.9 (A result on decreasing sequences)

Show that if un is a decreasing sequence of positive real numbers such that∑

n un <∞, wehave supn nun <∞. Show that this result applies to the proof of Corollary A.3 in Appendix A.

3.10 (Complements to the proof of Corollary A.3)Complete the proof of Corollary A.3 by showing that the term d4 is uniformly bounded in t ,h and k.

3.11 (Nonmixing chain)Consider the nonmixing Markov chain defined in Example A.3. Which of the assumptions(i)–(iii) in Theorem 3.1 does the chain satisfy and which does it not satisfy?

Page 94: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 95: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

4

Temporal Aggregation and WeakGARCH Models

Most financial series are analyzed at different frequencies (daily, weekly, monthly, . . . ). Theproperties of a series and, as a consequence, of the model fitted to the series, often cruciallydepend on the observation frequency. For instance, empirical studies generally find a strongerpersistence (that is, α + β closer to 1) in GARCH(1, 1) models, when the frequency increases.

For a given asset, observed at different frequencies, a natural question is whether strongGARCH models at different frequencies are compatible. If the answer is positive, the class ofGARCH models will be called stable by temporal aggregation . In this chapter, we consider,more generally, invariance properties of the class of GARCH processes with respect to timetransformations frequently encountered in econometrics. It will be seen that, to obtain stabilityproperties, a wider class of GARCH-type models, called weak GARCH and based on the L2

structure of the squared returns, has to be introduced.

4.1 Temporal Aggregation of GARCH Processes

Temporal aggregation arises when the frequency of data generation is lower than that of theobservations so that the underlying process is only partially observed. The time series resultingfrom temporal aggregation may of course have very different properties than the original timeseries. More formally, the temporal aggregation problem can be formulated as follows: givena process (Xt ) and an integer m, what are the properties of the sampled process (Xmt ) (that is,constructed from (Xt ) by only keeping every mth observation)? Does the aggregated process (Xmt)

belong to the same class of models as the original process (Xt )? If this holds for any (Xt ) andany m, the class is said to be stable by temporal aggregation.

An elementary example of a model that is stable by temporal aggregation is obviously whitenoise (strong or weak): the independence (or noncorrelation) property is kept for the aggregatedprocess, as well as the property of zero mean and fixed variance. On the other hand, ARMAmodels in the strong sense are generally not stable by temporal aggregation. It is only by relaxingthe assumption of noise independence, that is, by considering the class of weak ARMA models,that temporal aggregation holds.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 96: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

80 GARCH MODELS

We shall see that, like many strong models (based on an iid white noise), GARCH models in thestrong or semi-strong sense (that is, in the sense of Definition 2.1), are not stable by aggregation: aGARCH model at a given frequency is not compatible with a GARCH model at another frequency.As for ARMA models, temporal aggregation is obtained by enlarging the class of GARCH.

4.1.1 Nontemporal Aggregation of Strong ModelsTo show that temporal aggregation does not hold for GARCH models, it suffices to consider theARCH(1) example. Let (εt ) be the nonanticipative, second-order solution of the model:

εt = {ω + αε2t−1}1/2ηt , ω> 0, 0 < α < 1, (ηt ) iid (0, 1), E(η4

t ) = µ4 <∞.

The model satisfied by the even-numbered observations is easily obtained:

ε2t = {ω(1+ αη22t−1)+ α2ε2

2(t−1)η22t−1}1/2η2t .

It follows that {E(ε2t |ε2(t−1), ε2(t−2), . . .) = 0

Var(ε2t |ε2(t−1), ε2(t−2), . . .) = ω(1+ α)+ α2ε22(t−1),

because η2t and η2t−1 are independent of the variables involved in the conditioning. Thus, the pro-cess (ε2t ) is ARCH(1) in the semi-strong sense (Definition 2.1). It is a strong ARCH if the processdefined by dividing ε2t by its conditional standard deviation,

ηt = ε2t

{ω(1+ α)+ α2ε22(t−1)}1/2

,

is iid. We have seen that E(ηt |ε2(t−1), ε2(t−2), . . .) = 0 and E(η2t |ε2(t−1), ε2(t−2), . . .) = 1, but

E(η4t |ε2(t−1), ε2(t−2), . . .)

= µ4ω2(1+ 2α + α2µ4)+ 2ω(1+ αµ4)α

2ε22(t−1) + µ4α

4ε42(t−1)

{ω(1+ α)+ α2ε22(t−1)}2

= µ4

(1+ (µ4 − 1)α2(ω + αε2

2(t−1))2

{ω(1+ α)+ α2ε22(t−1)}2

).

If E(η4t |ε2(t−1), ε2(t−2), . . .) were a.s. constant, we would have α = 0 (no ARCH effect), or µ4 = 1

(η2t = 1, a.s.), or, for some constant K ,

ω + αε22(t−1)

ω(1+ α)+ α2ε22(t−1)

= K, a.s.,

the latter inequality implying that ε22(t−1) = K∗, a.s., for another constant K∗. By stationarity

ε2t = K∗, a.s., for all t and η2

t = ε2t {ω + αε2

t−1}−1 would take only one value, leading us again tothe case µ4 = 1. This proves that the process (ηt ) is not iid, whence α > 0 (presence of ARCH),and µ4 = 1 (nondegenerate law of η2

t ). The process (ε2t ) is thus not strong GARCH, although (εt )is strong GARCH.

It can be shown that this property extends to any integer m (Exercise 4.1).From this example, it might seem that strong GARCH processes aggregate in the class of

semi-strong GARCH processes. We shall now see that this is not the case.

Page 97: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TEMPORAL AGGREGATION AND WEAK GARCH MODELS 81

4.1.2 Nonaggregation in the Class of Semi-StrongGARCH Processes

Let (εt ) denote the nonanticipative, second-order stationary solution of the strong ARCH(2) model:

εt = {ω + α1ε2t−1 + α2ε

2t−2}1/2ηt , ω, α1, α2 > 0, α1 + α2 < 1,

under the same assumptions on (ηt ) as before. In view of (2.4), the AR(2) representation satisfiedby (ε2

t ) is

ε2t = ω + α1ε

2t−1 + α2ε

2t−2 + νt , (4.1)

where (νt ) is the strong innovation of (ε2t ). Using the lag operator, this model is written as

(1− λ1L)(1+ λ2L)ε2t = ω + νt ,

where λ1 and λ2 are real positive numbers (such that λ1 − λ2 = α1 and λ1λ2 = α2). Multiplyingthis equation by (1+ λ1L)(1− λ2L), we obtain

(1− λ21L

2)(1− λ22L

2)ε2t = ω(1+ λ1)(1− λ2)+ (1+ λ1L)(1− λ2L)νt ,

that is,(1− λ2

1L)(1− λ22L)y

2t = ω∗ + vt ,

where ω∗ = ω(1+ λ1)(1− λ2), vt = ν2t + (λ1 − λ2)ν2t−1 − λ1λ2ν2t−2 and yt = ε2t . Observe that(vt ) is an MA(1) process, such that

Cov(vt , vt−1)

= Cov {ν2t + (λ1 − λ2)ν2t−1 − λ1λ2ν2t−2, ν2t−2 + (λ1 − λ2)ν2t−3 − λ1λ2ν2t−4}= −λ1λ2Var(νt ).

It follows that (vt ) can be written as vt = ut − θut−1, where (ut ) is a white noise and θ is aconstant depending on λ1 and λ2. Finally, y2

t = ε22t has the following ARMA(2, 1) representation:

ε22t = ω∗ + (λ2

1 + λ22)ε

22(t−1) − λ2

1λ22ε

22(t−2) + ut − θut−1. (4.2)

The ARMA orders are compatible with a semi-strong GARCH(1, 2) model for (ε2t )t , with condi-tional variance:

σ 2t = Var(ε2

2t | ε22(t−1), ε

22(t−2), . . .)

= ω + α1ε22(t−1) + α2ε

22(t−2) + βσ 2

t−1, ω > 0, α1 ≥ 0, α2 ≥ 0, β ≥ 0.

If (ε2t )t were such a semi-strong GARCH(1, 2) process, the corresponding ARMA(2, 1) represen-tation would then be

ε22t = ω + (α1 + β)ε2

2(t−1) + α2ε22(t−2) + νt − βνt−1,

in view of (2.4). This equation is not compatible with (4.2), because of the sign of the coefficientof ε2

2(t−2). We can conclude that if (εt ) is a strong ARCH(2), (ε2t ) is never a semi-strong GARCH.

Page 98: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

82 GARCH MODELS

4.2 Weak GARCH

The previous example shows that the square of a process obtained by temporal aggregation ofa strong or semi-strong GARCH admits an ARMA representation. This leads us to the follow-ing definition.

Definition 4.1 (Weak GARCH process) A fourth-order stationary process (εt ) is said to be aweak GARCH(r, p) if:

(i) (εt ) is a white noise;

(ii) (ε2t ) admits an ARMA representation of the form

ε2t −

r∑i=1

aiε2t−i = c + νt −

p∑i=1

biνt−i (4.3)

where (νt ) is the linear innovation of (ε2t ).

Recall that the property of linear innovation entails that

Cov(νt , ε2t−k) = 0, ∀k > 0.

From (2.4), semi-strong GARCH(p, q) processes satisfy, under the fourth-order stationarity con-dition, Definition 4.1 with r = max(p, q). The linear innovation coincides in this case with thestrong innovation: νt is thus uncorrelated with any variable of the past of εt (provided this corre-lation exists).

Remark 4.1 (Generality of the weak GARCH class) Let (Xt) denote a strictly stationary,purely nondeterministic process, admitting moments up to order 4. By the Wold theorem, (Xt)

admits an MA(∞) representation. Suppose this representation is invertible and there exists anARMA representation of the form

Xt +P∑i=1

φiXt−i = εt +Q∑i=1

ψiεt−i

where (εt ) is a weak white noise with variance σ 2 > 0, and the polynomials �(z) = 1+φ1z+ · · · + φP z

P and �(z) = 1+ ψ1z+ · · · + ψQzQ have all their roots outside the unit disk

and have no common root. Without loss of generality, suppose that φP = 0 and ψQ = 0 (byconvention, φ0 = ψ0 = 1). The process (εt ) can then be interpreted as the linear innovationof (Xt ). The process (ε2

t )t∈Z is clearly second-order stationary and purely nondeterministic.It follows that it admits an MA(∞) representation by the Wold theorem. If this representation isinvertible, the process (εt ) is a weak GARCH process.

The class of weak GARCH processes is not limited to processes obtained by temporal aggregation.Before returning to temporal aggregation, we conclude this section with further examples of weakGARCH processes.

Example 4.1 (GARCH with measurement error) Suppose that a GARCH process (εt ) isobserved with a measurement error Wt . We have

εt = et +Wt, et = σtZt , σ 2t = c +

q∑i=1

aie2t−i +

p′∑i=1

biσ2t−i . (4.4)

Page 99: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TEMPORAL AGGREGATION AND WEAK GARCH MODELS 83

For simplicity, it can be assumed that the sequences (Zt ) and (Wt ) are mutually independent, iidand centered, with variances 1 and σ 2

W respectively.It can be shown (Exercise 4.3) that (εt ) is a weak GARCH process of the form

ε2t −

max{p,q}∑i=1

(ai + bi)ε2t−i = c +

(1−

max{p,q}∑i=1

ai + bi

)σ 2W + ut +

max{p,q}∑i=1

βiut−i

where the βi are different from the −bi , unless σW = 0. It is worth noting that the AR part of thisrepresentation is not affected by the presence of the perturbation Wt .

Statistical inference on GARCH with measurement errors is complicated because the likelihoodcannot be written in explicit form. Methods using least squares, the Kalman filter or simulationshave been suggested to estimate these models.

Example 4.2 (Quadratic GARCH) Consider the modification of the semi-strong GARCHmodel given by

E(εt |εt−1) = 0 and E(ε2t |εt−1) = σ 2

t =(c +

q∑i=1

aiεt−i

)2

+p∑i=1

biσ2t−i , (4.5)

where the constants bi are positive. Let ut = ε2t − σ 2

t . The ut are nonautocorrelated, uncorrelatedwith any variable of the future (by the martingale difference assumption) and, by definition, withany variable of the past of εt . Rewrite the equation for σ 2

t as

ε2t = c2 +

max{p,q}∑i=1

(a2i + bi)ε

2t−i + vt ,

where

vt = 2cq∑i=1

aiεt−i +∑i =j

aiaj εt−iεt−j + ut −p∑i=1

biut−i . (4.6)

It is not difficult to verify that (vt ) is an MA(max{p, q}) process (Exercise 4.4). It follows that(εt ) is a weak GARCH(max{p, q},max{p, q}) process.

Example 4.3 (Markov-switching GARCH) Markov-switching models (ARMA, GARCH)allow the coefficients to depend on a Markov chain, in order to take into account the changes ofregime in the dynamics of the series. The chain being unobserved, these models are also referredto as hidden Markov models.

The simplest Markov-switching GARCH model is obtained when a single parameter ω isallowed to depend on the Markov chain. More precisely, let (�t ) be a Markov chain with statespace 0, 1, . . . , K − 1. Suppose that this chain is homogenous, stationary, irreducible and aperiodic,and let pij = P [�t = j |�t−1 = i], for i, j = 0, 1, . . . , K − 1, be its transition probabilities. Themodel is given by

εt = σtηt , σ 2t = ω(�t )+

q∑i=1

aiε2t−i +

p∑i=1

biσ2t−i , (4.7)

with

ω(�t ) =K∑i=1

ωi 1l{�t=i−1}, 0 < ω1 < ω2 < . . . < ωK, (4.8)

Page 100: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

84 GARCH MODELS

where (ηt ) is an iid (0, 1) process with finite fourth-order moment, the sequence (ηt ) beingindependent of the sequence (�t ). Tedious computations show that (εt ) is a weakGARCH(max{p, q} +K − 1, p + K− 1) process of the form

K−1∏k=1

(1− λkL)

(I −

max{p,q}∑i=1

(ai + bi)Li

)ε2t = ω +

I + p+K−1∑i=1

βiLi

ut , (4.9)

where λ1, . . . , λK−1 are the eigenvalues different from 1 of the matrix P = (pji). The βi generallydo not have simple expressions as functions of the initial parameters, but can be numericallyobtained from the first autocorrelations of the process (ε2

t ) (Exercise 4.7).

Example 4.4 (Stochastic volatility model) An example of stochastic volatility model is given by

εt = σtηt , σ 2t = c + dσ 2

t−1 + (a + bσ 2t−1)vt , c, d, b > 0, a ≥ 0. (4.10)

where (ηt ) and (vt ) are iid (0, 1) sequences, with ηt independent of the vt−j , j ≥ 0. Note thatthe GARCH(1, 1) process is obtained by taking vt = Z2

t−1 − 1 and a = 0. Under the assump-tion d2 + b2 < 1, it can be shown (Exercise 4.5) that the autocovariance structure of (ε2

t ) ischaracterized by

Cov(ε2t , ε

2t−h) = dCov(ε2

t , ε2t−h+1), ∀h> 1.

It follows that (εt ) is a weak GARCH(1, 1) process with

ε2t − dε2

t−1 = c + ut + βut−1, (4.11)

where (ut ) is a weak white noise and β can be explicitly computed.

Example 4.5 (Contemporaneous aggregation of GARCH processes) It is standard in financeto consider linear combinations of several series (for instance, to define portfolios). If these seriesare GARCH processes, is their linear combination also a GARCH process? To simplify the pre-sentation, consider the sum of two GARCH(1, 1) processes, defined as the second-order stationaryand nonanticipative solutions of

εit = σitηit , σ 2it = ωi + αiε

2i,t−1 + βiσ

2i,t−1, ωi > 0, αi, βi ≥ 0, (ηit ) iid (0, 1), i = 1, 2,

and suppose that the sequences (η1t ) and (η2t ) are independent. Let εt = ε1t + ε2t . It is easy to seethat (εt ) is a white noise. We have, for h> 0, Cov(ε2

it , ε2j,t−h) = 0 for i = j , because the processes

(ε1t ) and (ε2t ) are independent. Moreover, for h> 0,

Cov(ε1t ε2t , ε21,t−h) = E(ε1t ε2t ε

21,t−h) = E(η1t )E(σ1t ε2t ε

21,t−h) = 0

because η1t is independent of the other variables. It follows that, for h> 0,

Cov(ε2t , ε

2t−h) = Cov(ε2

1t , ε21,t−h)+ Cov(ε2

2t , ε22,t−h). (4.12)

By formula (2.61), we deduce that

γε2(h) := Cov(ε2t , ε

2t−h) = γε2

1(1)(α1 + β1)

h−1 + γε22(1)(α2 + β2)

h−1, h ≥ 1. (4.13)

Page 101: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TEMPORAL AGGREGATION AND WEAK GARCH MODELS 85

If f is a function defined on the integers, denote by Lf the function such that Lf (h) = f (h− 1),h> 0. We have (1− βL)βh = 0 for h> 0. Relation (4.13) shows that

{1− (α1 + β1)L}{1− (α2 + β2)L}γε2(h) = 0, h> 2.

It follows that (ε2t ) is a weak GARCH(2, 2) process of the form

{1− (α1 + β1)L}{1− (α2 + β2)L}ε2t = ω + ut + θ1ut−1 + θ2ut−2,

where (ut ) is a noise. Since Eε2t = Eε2

1t + Eε22t , we obtain ω = {1− (α1 + β1)}ω2 + {1− (α2 +

β2)}ω1. Note that the orders obtained for the ARMA representation of ε2t are not necessarily the

minimum ones. Indeed, if α1 + β1 = α2 + β2, then γε2 (h) = {γε21(1)+ γε2

2(1)}(α1 + β1)

h−1, h ≥ 1,

from (4.13). Therefore, {1− (α1 + β1)L}γε2 (h) = 0 if h> 1. Thus (ε2t ) is a weak GARCH(1, 1)

process. This example can be generalized to higher-order GARCH models (Exercise 4.9).

Example 4.6 (β-ARCH process) Consider the conditionally heteroscedastic AR(1) modeldefined by

Xt = φXt−1 + (c + a|Xt−1|2β)1/2ηt , |φ| < 1, c > 0, a ≥ 0,

where (ηt ) is an iid (0, 1) symmetrically distributed sequence. A difference between this model,called β-ARCH, and the standard ARCH is that the conditional variance of Xt is specified as afunction of Xt−1, not as a function of the noise.

Suppose β = 1 and letεt = (c + aX2

t−1)1/2ηt .

We have

ε2t = c + a

∑i≥1

φi−1εt−i

2

+ ut ,

where ut = ε2t − E(ε2

t |εt−1). By expanding the squared term we obtain the representation

[1− (φ2 + a)L]ε2t = c(1− φ2)+ vt − φ2vt−1,

where vt = a∑

i,j≥1,i =j φi+j−2εt−iεt−j + ut . Note that the process (vt − φ2vt−1) is MA(1).

Consequently, (ε2t ) is an ARMA(1, 1) process. Finally, the process (Xt ) admits a weak AR(1)-

GARCH(1, 1) representation.

4.3 Aggregation of Strong GARCH Processesin the Weak GARCH Class

We have seen that the class of semi-strong GARCH models (defined in terms of conditionalmoments) is not large enough to include all processes obtained by temporal aggregation of strongGARCH. In this section we show that the weak GARCH class of models is stable by temporalaggregation. Before dealing with the general case, we consider the GARCH(1, 1) model, for whichthe solution is more explicit.

Theorem 4.1 (Aggregation of the GARCH(1, 1) process) Let (εt ) be a weak GARCH(1, 1)process. Then, for any integer m ≥ 1, the process (εmt ) is also a weak GARCH(1, 1) process. Theparameters of the ARMA representations

ε2t − aε2

t−1 = c + νt − bνt−1 and ε2mt − a(m)ε

2m(t−1) = c(m) + ν(m),t − b(m)ν(m),t−1

Page 102: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

86 GARCH MODELS

are related by the relations

a(m) = am, c(m) = c1− am

1− a,

b(m)

1+ b2(m)

= am−1b(1− a2)

(1− a2)(1+ b2a2(m−1))+ (a − b)2(1− a2(m−1)).

Proof. First note that, (ε2t ) being stationary by assumption, and (νt ) being its linear innovation,

a is strictly less than 1 in absolute value. Now, if (εt ) is a white noise, (εmt ) is also a white noise.By successive substitutions we obtain

ε2t = c(1+ a + . . .+ am−1)+ amε2

t−m + vt (4.14)

where vt = νt + (a − b)[νt−1 + aνt−2 + . . .+ am−2νt−m−1]− am−1bνt−m. Because (νt ) is a noise,we have

Cov(vt , vt−mk) = 0, ∀k > 1.

Hence, (vmt )t∈Z is an MA(1) process, from which it follows that (εmt ) is an ARMA(1, 1) process.The constant term and the AR coefficient of this representation appear directly in (4.14), whereasthe MA coefficient is obtained as the solution, of absolute value less than 1, of

b(m)

1+ b2(m)

= −Cov(vt , vt−m)Var(vt )

= am−1b

1+ (a − b)2(1+ a2 + . . .+ a2(m−2))+ a2(m−1)b2

which, after simplification, gives the claimed formula. �

Note, in particular, that the aggregate of an ARCH(1) process is another ARCH(1): b = 0 �⇒b(m) = 0.

It is also worth noting that am tends to 0 when m tends to infinity, thus a(m) and b(m) also tendto 0. In other words, the conditional heteroscedasticity tends to vanish by temporal aggregation ofGARCH processes. This conforms to the empirical observation that low-frequency series (weekly,monthly) display less ARCH effect than daily series, for instance.

The previous result can be straightforwardly extended to the GARCH(1, p) case. Denote by[x] the integer part of x.

Theorem 4.2 (Aggregation of the GARCH(1, p) process) Let (εt ) be a weak GARCH(1, p)process. Then, for any integer m ≥ 1, the process (εmt ) is a weak GARCH

(1, 1+ [p−1

m

])process.

Proof. In the proof of Theorem 4.1, equation (4.14) remains valid subject to a modificationof the definition of vt . Introduce the lag polynomial Q(L) = 1− b1L− · · · − bpL

p . We havevt = Q(L))[1+ aL+ . . .+ am−1Lm−1]νt . Thus, because (νt ) is a noise,

Cov(vt , vt−mk) = 0, ∀k > 1+[p − 1

m

].

Hence, (vmt ) is an MA(1+ [p−1

m

])process, and the conclusion follows. �

It can be seen from (4.14) that the constant term and the AR coefficient of the ARMA represen-tation of (εmt ) are the same as in Theorem 4.1. The coefficients of the MA part can be determinedthrough the first 2+ [p−1

m

]autocovariances of the process (vmt ).

Page 103: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TEMPORAL AGGREGATION AND WEAK GARCH MODELS 87

Note that temporal aggregation always entails a reduction of the MA order in the ARMArepresentation (except when p = 1, for which it remains equal to 1) – all the more so as m

increases. Let us now turn to the general case.

Theorem 4.3 (Aggregation of the GARCH(r, p) process) Let (εt ) be a weak GARCH(r, p)process. Then for any integer m ≥ 1, the process (εmt ) is a weak GARCH

(r, r + [p−r

m

])process.

Proof. Denote by λi (1 ≤ i ≤ r) the inverses of the complex roots of the AR polynomial of theARMA representation for (ε2

t ). Write model (4.3) in the form

r∏i=1

(1− λiL)(ε2t − µ) = Q(L)νt ,

where µ = Eε2t and Q(L) = 1−∑p

i=1 biLi . Applying the operator

∏ri=1(1+ λiL+ · · · +

λm−1i Lm−1) to this equation we get

r∏i=1

(1− λmi Lm)(ε2

t − µ) =r∏i=1

(1+ λiL+ · · · + λm−1i Lm−1)Q(L)νt .

Consider now the process(Z(m)t

)defined by Z

(m)t = ε2

mt − µ. We have the model

r∏i=1

(1− λmi L)Z(m)t =

r∏i=1

(1+ λiL+ · · · + λm−1i Lm−1)Q(L)νmt := vt ,

with the convention that Lνmt = νmt−1. Observe that vt = f (νmt , νmt−1, . . . , νmt−r(m−1)−p). Thissuffices to show that the process (vt ) is a moving average. The largest index k for which vtand vt−k have a common term νi is such that r(m− 1)+ p −m < mk ≤ r(m− 1)+ p. Thusk = r + [p−r

m

], which gives the order of the moving average part, and subsequently the orders of

the ARMA representation for ε2mt . �

This proof suggests the following scheme for deriving the exact form of the ARMArepresentation:

(i) The AR coefficients are deduced from the roots of the AR polynomial; but in the previousproof we saw that these roots, for the aggregate process, are the mth powers of the rootsof the initial AR polynomial.

(ii) The constant term immediately follows from the AR coefficients and the expectation ofthe process: E(ε2

mt) = E(ε2t ).

(iii) The derivation of the MA part is more tedious and requires computing the first r +[p−rm

]autocovariances of the process (ε2

mt ); these autocovariances follow from the ARMArepresentation for ε2

t .

An alternative method involves multiplying the ARMA representation of (ε2t ), written in

polynomial form, by a well-chosen lag polynomial so as to directly obtain the AR part of theARMA representation of (ε2m

t ). Let P(L) =∏ri=1(1− λiL) denote the AR polynomial of the

representation of (ε2t ). The AR polynomial of the representation of (ε2m

t ) is given by

P(L)

r∏i=1

(1+ λiL+ . . . λm−1i Lm−1) =

r∏i=1

(1− λmi Lm).

Page 104: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

88 GARCH MODELS

Example 4.7 (Computation of a weak GARCH representation) Consider the GARCH(2, 1)process defined by {

εt = σtηt , (ηt ) iid (0, 1)σ 2t = 1+ 0.1ε2

t−1 + 0.1ε2t−2 + 0.2σ 2

t−1,

and let us derive the weak GARCH representation of the process (ε2t ).

The ARMA representation of (ε2t ) is written as

ε2t = 1+ 0.3ε2

t−1 + 0.1ε2t−2 + νt − 0.2νt−1,

that is,(1− 0.5L)(1+ 0.2L)ε2

t = 1+ (1− 0.2L)νt .

Multiplying this equation by (1+ 0.5L)(1− 0.2L), we obtain

(1− 0.25L2)(1− 0.04L2)ε2t = 1.2+ (1+ 0.5L)(1− 0.2L)2νt .

Set vt = (1+ 0.5L)(1− 0.2L)2νt . The process (v2t ) is MA(1), v2t = ut − θut−1, where θ = 0.156is the solution, with absolute value less than 1, of the equation

θ

1+ θ2= −Cov(vt , vt−2)

Var(vt )= 0.16− 0.02× 0.1

1+ 0.12 + 0.162 + 0.022= 0.1525.

The weak GARCH(2, 1) representation of the process (ε2t ) is then

ε22t = 1.2+ 0.29ε2

2(t−1) − 0.01ε22(t−2) + ut − 0.156ut−1.

Observe that the sign of the coefficient of ε22(t−2) is not compatible with a strong or semi-strong

GARCH.

4.4 Bibliographical Notes

The main results concerning the temporal aggregation of GARCH models were established byDrost and Nijman (1993). It should be noted that our definition of weak GARCH models is notexactly the same as theirs: in Definition 4.1, the noise (νt ) is not the strong innovation of (ε2

t ), butonly the linear one. Drost and Werker (1996) introduced the notion of the continuous-time GARCHprocess and deduced the corresponding weak GARCH models at the different frequencies (see alsoDrost, Nijman and Werker, 1998). The problem of the contemporaneous aggregation of independentGARCH processes was studied by Nijman and Sentana (1996).

Model (4.5) belongs to the class of quadratic ARCH models introduced by Sentana(1995). GARCH models observed with measurement errors are dealt with by Harvey, Ruiz andSentana (1992), Gourieroux, Monfort and Renault (1993) and King, Sentana and Wadhwani (1994).Example 4.4 belongs to the class of stochastic autoregressive volatility (SARV) models introducedby Andersen (1994). The β-ARCH model was introduced by Diebolt and Guegan (1991).

Markov-switching ARMA(p, q) models were introduced by Hamilton (1989). Pagan andSchwert (1990) considered a variant of such models for modeling the conditional variance offinancial series. Model (4.7) was studied by Cai (1994) and Dueker (1997); see also Hamilton andSusmel (1994). The probabilistic properties of the Markov-switching GARCH models were studiedby Francq, Roussignol and Zakoıan (2001). The existence of ARMA representations for powersof ε2

t (as in (4.9)) was established by Francq and Zakoıan (2005), and econometric applications ofthis property were studied by Francq and Zakoıan (2008).

The examples of weak GARCH models discussed in this chapter were analyzed by Francq andZakoıan (2000), where a two-step least-squares method of estimation of weak ARMA-GARCHwas also proposed.

Page 105: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TEMPORAL AGGREGATION AND WEAK GARCH MODELS 89

4.5 Exercises

4.1 (Aggregate strong ARCH(1) process)Show that the process (εmt ) obtained by temporal aggregation of a strong ARCH(1) process(εt ) is a semi-strong ARCH.

4.2 (Aggregate weak GARCH(1, 2) process)State the equivalent of Theorem 4.1 for a GARCH(1, 2) process.

4.3 (GARCH with measurement error)Show that in Example 4.1 we have Cov(ε2

t , ε2t−h) = Cov(e2

t , e2t−h), for all h> 0. Use this result

to deduce the weak GARCH representation of (εt ).

4.4 (Quadratic ARCH)Verify that the process (vt ) defined in (4.6) is an MA(max{p, q}) process.

4.5 (Stochastic volatility model)In model (4.10), the volatility equation can be written as

σ 2t = A(vt )+ B(vt )σ

2t−1,

where A(vt ) = c + avt , B(vt ) = d + bvt . Suppose that d2 + b2 < 1.

1. Show that

Cov(ε2t , ε

2t−h) = Cov(B(vt )σ

2t−1, A(vt−h)+ B(vt−h)σ 2

t−h−1), ∀h> 0.

2. Express σ 2t−1 as a function of σ 2

t−h and of the process (vt ) and deduce that, for all h> 0,

Cov(ε2t , ε

2t−h) = [E{B(vt )}]h[2Cov{A(vt ), B(vt )}Eσ 2

t + Var{A(vt )}+Var{B(vt )}(Eσ 2

t )2 + Var(σ 2

t )E{B(vt )2}].

3. Using the second-order stationarity of (σ 2t ), compute E(σ 2

t ) and Var(σ 2t ) and determine

Cov(ε2t , ε

2t−h) for h> 0.

4. Conclude that (4.11) holds and explain how to obtain β.

4.6 (Independent-switching model)Consider model (4.7)–(4.8) in the particular case where the chain (�) is an iid process (thatis, when p(i, j) does not depend on i, for any (i, j)). Give a more explicit form for the weakGARCH representation (4.9).

4.7 (A two-regime Markov-switching model without ARCH coefficients)In model (4.7)–(4.8), suppose that p = q = 0 (that is, σ 2

t = ω(�t )) and take for (�t ) atwo-state Markov chain with 0 < p01 < 1, 0 < p10 < 1. Let π(i) = P(�t = i). Denote byp(k)(i, j) the k-step transition probabilities, that is, the entries of Pk . Set λ = p(1, 1)+p(2, 2)− 1.

1. Compute Eε2t .

2. Show that, for i, j = 1, 2,

p(k)(i, j)− π(j) = λk[{1− π(j)} 1l{i=j} −π(j) 1l{i =j}

]. (4.15)

Page 106: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

90 GARCH MODELS

3. Deduce that, for k > 0,

Cov(ε2t , ε

2t−k) = λk{ω1 − ω2}2π(1)π(2).

4. Compute Var(ε2t ).

5. Deduce that ε2t has an ARMA(1, 1) representation and determine the AR coefficient.

6. Simplify this representation in the case p01 + p10 = 1.

7. Determine numerically the ARMA(1, 1) representation for the model:

εt = σtηt , σ 2t = 0.21 1l�t=1+46.0 1l�t=2, p(1, 1) = 0.98, p(2, 1) = 0.38,

where ηt ∼ N(0, 1).

4.8 (Bilinear model)Let εt = ηtηt−1, where (ηt ) is a strong white noise with unit variance such that E(η8

t ) <∞.

1. Show that the process (εt ) is a weak GARCH.

2. Show that the process (ε2t − 1) is a weak ARMA-GARCH.

4.9 (Contemporaneous aggregation)Using the method of the proof of Theorem 4.3, generalize Example 4.5 by considering thecontemporaneous aggregation of independent strong GARCH processes of any orders.

Page 107: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Part II

Statistical Inference

Page 108: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 109: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

5

Identification

In this chapter, we consider the problem of selecting an appropriate GARCH or ARMA-GARCHmodel for given observations X1, . . . , Xn of a centered stationary process. A large part of thetheory of finance rests on the assumption that prices follow a random walk. The price variationprocess, X = (Xt ), should thus constitute a martingale difference sequence, and should coincidewith its innovation process, ε = (εt ). The first question addressed in this chapter, in Section 5.1,will be the test of this property, at least a consequence of it: absence of correlation. The problemis far from trivial because standard tests for noncorrelation are actually valid under an indepen-dence assumption. Such an assumption is too strong for GARCH processes which are dependentthough uncorrelated.

If significant sample autocorrelations are detected in the price variations – in other words, ifthe random walk assumption cannot be sustained – the practitioner will try to fit an ARMA(P,Q)model to the data before using a GARCH(p, q) model for the residuals. Identification of theorders (P,Q) will be treated in Section 5.2, identification of the orders (p, q) in Section 5.3.Tests of the ARCH effect (and, more generally, Lagrange multiplier tests) will be considered inSection 5.4.

5.1 Autocorrelation Check for White Noise

Consider the GARCH(p, q) modelεt = σtηt

σ 2t = ω +

q∑i=1

αiε2t−i +

p∑j=1

βjσ2t−j

(5.1)

with (ηt ) a sequence of iid centered variables with unit variance, ω> 0, αi ≥ 0 (i = 1, . . . , q),βj ≥ 0 (j = 1, . . . , p). We saw in Section 2.2 that, whatever the orders p and q, the nonantici-pative second-order stationary solution of (5.1) is a white noise, that is, a centered process whosetheoretical autocorrelations ρ(h) = Eεtεt+h/Eε2

t satisfy ρ(h) = 0 for all h = 0.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 110: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

94 GARCH MODELS

Given observations ε1, . . . , εn, the theoretical autocorrelations of a centered process (εt ) aregenerally estimated by the sample autocorrelations (SACRs)

ρ(h) = γ (h)

γ (0), γ (h) = γ (−h) = n−1

n−h∑t=1

εtεt+h, (5.2)

for h = 0, 1, . . . , n− 1. According to Theorem 1.1, if (εt ) is an iid sequence of centered randomvariables with finite variance then

√nρ(h)

L→ N (0, 1) ,

for all h = 0. For a strong white noise, the SACRs thus lie between the confidence bounds±1.96/

√n with a probability of approximately 95% when n is large. In standard software, these

bounds at the 5% level are generally displayed with dotted lines, as in Figure 5.2. These signifi-cance bands are not valid for a weak white noise, in particular for a GARCH process (Exercises5.3 and 5.4). Valid asymptotic bands are derived in the next section.

5.1.1 Behavior of the Sample Autocorrelationsof a GARCH Process

Let ρm = (ρ(1), . . . , ρ(m))′ denote the vector of the first m SACRs, based on n observations ofthe GARCH(p, q) process defined by (5.1). Let γm = (γ (1), . . . , γ (m))′ denote a vector of sampleautocovariances (SACVs).

Theorem 5.1 (Asymptotic distributions of the SACVs and SACRs) If (εt ) is the nonanticipa-tive and stationary solution of the GARCH(p, q) model (5.1) and if Eε4

t <∞, then, when n→∞,

√nγm

L→ N (0, γm

)and

√nρm

L→ N (0,ρm := {Eε2t }−2γm

),

where

γm =

Eε2

t ε2t−1 Eε2

t εt−1εt−2 . . . Eε2t εt−1εt−m

Eε2t εt−1εt−2 Eε2

t ε2t−2

...

.... . .

Eε2t εt−1εt−m · · · Eε2

t ε2t−m

is nonsingular. If the law of ηt is symmetric then γm is diagonal.

Note that ρm = Im when (εt ) is a strong white noise, in accordance with Theorem 1.1.

Proof. Let γm = (γ (1), . . . , γ (m))′, where γ (h) = n−1∑nt=1 εt εt−h. Since, for m fixed,

∥∥√nγm −√nγm∥∥2 =1√n

m∑h=1

E

(h∑t=1

εt εt−h

)2

1/2

≤ 1√n

m∑h=1

h∑t=1

‖εt‖24 → 0

as n→∞, the asymptotic distribution of√nγm coincides with that of

√nγm. Let h and k belong

to {1, . . . , m}. By stationarity,

Page 111: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 95

Cov{√

nγ (h),√nγ (k)

} = 1

n

n∑t,s=1

Cov (εt εt−h, εsεs−k)

= 1

n

n−1∑ =−n+1

(n− | |)Cov (εt εt−h, εt+ εt+ −k)

= Eε2t εt−hεt−k

because

Cov (εt εt−h, εt+ εt+ −k) ={Eε2

t εt−hεt−k if = 0,0 otherwise.

From this, we deduce the expression for γm . From the Cramer–Wold theorem,1 the asymptoticnormality of

√nγm will follow from showing that for all nonzero λ = (λ1, . . . , λm)

′ ∈ Rm,

√nλ′γm

L→ N(0, λ′γmλ). (5.3)

Let Ft denote the σ -field generated by {εu, u ≤ t}. We obtain (5.3) by applying a central limittheorem (CLT) to the sequence (εt

∑mi=1 λiεt−i ,Ft )t , which is a stationary, ergodic and square

integrable martingale difference (see Corollary A.1).The asymptotic behavior of ρm immediately follows from that of γm(as in Exercise 5.3).Reasoning by contradiction, suppose that γm is singular. Then, because this matrix is the

covariance of the vector (εt εt−1, . . . , εt εt−m)′, there exists an exact linear combination of thecomponents of (εt εt−1, . . . , εt εt−m)′ that is equal to zero. For some i0 ≥ 1, we then have εtεt−i0 =∑m

i=i0+1 λiεt εt−i , that is, εt−i0 1l{ηt =0} =∑m

i=i0+1 λiεt−i 1l{ηt =0}. Hence,

Eε2t−i0 1l{ηt =0} =

m∑i=i0+1

λiE(εt−i0εt−i 1l{ηt =0})

=m∑

i=i0+1

λiE(εt−i0εt−i )P(ηt = 0) = 0

which is absurd. It follows that γm is nonsingular.When the law of ηt is symmetric, the diagonal form of γm is a consequence of property

(7.24) in Chapter 7. See Exercises 5.5 and 5.6 for the GARCH(1, 1) case. �

A consistent estimator γm of γm is obtained by replacing the generic term of γm by

n−1n∑t=1

ε2t εt−iεt−j ,

with, by convention, εs=0 for s < 1. Clearly, ρm := γ−2(0)γm is a consistent estimator of ρm

and is almost surely invertible for n large enough. This can be used to construct asymptoticsignificance bands for the SACRs of a GARCH process.

1 For any sequence (Zn) of random vectors of size d, ZnL→ Z if and only if, for all λ ∈ Rd , we have

λ′ZnL→ λ′Z.

Page 112: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

96 GARCH MODELS

Practical Implementation

The following R code allows us to draw a given number of autocorrelations ρ(i) and the

significance bands ±1.96√ρm(i, i)/n.

# autocorrelation functiongamma<-function(x,h){ n<-length(x); h<-abs(h);x<-x-mean(x)+ gamma<-sum(x[1:(n-h)]*x[(h+1):n])/n }rho<-function(x,h) rho<-gamma(x,h)/gamma(x,0)

# acf function with significance bands of a strong white noisenl.acf<-function(x,main=NULL,method=’NP’){+ n<-length(x); nlag<-as.integer(min(10*log10(n),n-1)) +acf.val<-sapply(c(1:nlag),function(h) rho(x,h)) + x2<-x^2 +var<-1+(sapply(c(1:nlag),function(h) gamma(x2,h)))/gamma(x,0)^2 +band<-sqrt(var/n) +minval<-1.2*min(acf.val,-1.96*band,-1.96/sqrt(n)) +

maxval<-1.2*max(acf.val,1.96*band,1.96/sqrt(n)) +acf(x,xlab=’Lag’,ylab=’SACR’,ylim=c(minval,maxval),main=main) +lines(c(1:nlag),-1.96*band,lty=1,col=’red’) +lines(c(1:nlag),1.96*band,lty=1,col=’red’) }

In Figure 5.1 we have plotted the SACRs and their significance bands for daily series ofexchange rates of the dollar, pound, yen and Swiss franc against the euro, for the period fromJanuary 4, 1999 to January 22, 2009. It can be seen that the SACRs are often outside the standardsignificance bands ±1.96/

√n, which leads us to reject the strong white noise assumption for all

these series. On the other hand, most of the SACRs are inside the significance bands shown assolid lines, which is in accordance with the hypothesis that the series are realizations of semi-strongwhite noises.

100 155 0 520 25 30 35

−0.0

50.

05

Lag

Sam

ple

auto

corr

elat

ion

Yen

10 15 20 25 30 35

−0.0

60.

000.

06

Lag

Sam

ple

auto

corr

elat

ion

Dollar

0 5 10 15 20 25 30 35

−0.0

50.

05

Lag

0 5 10 15 20 25 30 35

Lag

Sam

ple

auto

corr

elat

ion

Sterling

−0.1

00.

000.

10

Sam

ple

auto

corr

elat

ion

SWF

Figure 5.1 SACR of exchange rates against the euro, standard significance bands for the SACRsof a strong white noise (dotted lines) and significance bands for the SACRs of a semi-strong whitenoise (solid lines).

Page 113: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 97

5.1.2 Portmanteau TestsThe standard portmanteau test for checking that the data is a realization of a strong white noise isthat of Ljung and Box (1978). It involves computing the statistic

QLBm := n(n+ 2)

m∑i=1

ρ2(i)/(n − i)

and rejecting the strong white noise hypothesis if QLBm is greater than the (1− α)-quantile of

a χ2m.2

Portmanteau tests are constructed for checking noncorrelation, but the asymptotic distributionof the statistics is no longer χ2

m when the series departs from the strong white noise assumption.For instance, these tests are not robust to conditional heteroscedasticity. In the GARCH framework,we may wish to simultaneously test the nullity of the first m autocorrelations using more robustportmanteau statistics.

Theorem 5.2 (Corrected portmanteau test in the presence of ARCH) Under the assumptionsof Theorem 5.1, the portmanteau statistic

Qm = nρ′m−1ρmρm

has an asymptotic χ2m distribution.

Proof. Ii suffices to use Theorem 5.1 and the following result: if XnL→ N(0, ), with non-

singular, and if n → in probability, then X′n−1n Xn

L→ χ2m. �

A portmanteau test of asymptotic level α based on the first m SACRs involves rejecting thehypothesis that the data are generated by a GARCH process if Qm is greater than the (1− α)-quantile of a χ2

m.

5.1.3 Sample Partial Autocorrelations of a GARCHDenote by rm (rm) the vector of the m first partial autocorrelations (sample partial autocorrelations(SPACs)) of the process (εt ). By Theorem B.3, we know that for a weak white noise, the SACRsand SPACs have the same asymptotic distribution. This applies in particular to a GARCH process.Consequently, under the hypothesis of GARCH white noise with a finite fourth-order moment,consistent estimators of rm are

(1)rm= ρm or

(2)rm= Jmρm J

′m,

where Jm is the matrix obtained by replacing ρX(1), . . . , ρX(m) by ρX(1), . . . , ρX(m) in theJacobian matrix Jm of the mapping ρm �→ rm, and ρm is the consistent estimator of ρm definedafter Theorem 5.1.

Although it is not current practice, one can test the simultaneous nullity of several theoreticalpartial autocorrelations using portmanteau tests based on the statistics

Qr,BPm = nr ′mrm and Qr

m = nr ′m((i)ρm

)−1rm

2 The asymptotic distribution of QLBm is χ2

m. The Box and Pierce (1970) statistic QBPm := n

∑mi=1 ρ

2(i) hasthe same asymptotic distribution, but the QLB

m statistic is believed to perform better for finite samples.

Page 114: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

98 GARCH MODELS

2 4 6 8 10 12

−0.06

−0.04

−0.02

0.02

0.04

0.06

2 4 6 8 10 12 h h

−0.06

−0.04

−0.02

0.02

0.04

0.06

Figure 5.2 SACRs of a simulation of a strong white noise (left) and of the GARCH(1, 1) whitenoise (5.4) (right). Approximately 95% of the SACRs of a strong white noise should lie inside thethin dotted lines ±1.96/

√n. Approximately 95% of the SACRs of a GARCH(1, 1) white noise

should lie inside the thick dotted lines.

with, for instance, i = 2. From Theorem B.3, under the strong white noise assumption, the statisticsQr,BPm , QBP

m and QLBm have the same χ2

m asymptotic distribution. Under the hypothesis of a pureGARCH process, the statistics Qr

m and Qm also have the same χ2m asymptotic distribution.

5.1.4 Numerical IllustrationsStandard Significance Bounds for the SACRs are not Valid

The right-hand graph of Figure 5.2 displays the sample correlogram of a simulation of size n =5000 of the GARCH(1, 1) white noise{

εt = σtηtσ 2t = 1+ 0.3ε2

t−1 + 0.55σ 2t−1,

(5.4)

where (ηt ) is a sequence of iid N(0, 1) variables. It is seen that the SACRs of order 2 and 4 aresharply outside the 95% significance bands computed under the strong white noise assumption. Aninexperienced practitioner could be tempted to reject the hypothesis of white noise, in favor of amore complicated ARMA model whose residual autocorrelations would lie between the significancebounds ±1.96/

√n. To avoid this type of specification error, one has to be conscious that the bounds

±1.96/√n are not valid for the SACRs of a GARCH white noise. In our simulation, it is possible

to compute exact asymptotic bounds at the 95% level (Exercise 5.4). In the right-hand graph ofFigure 5.2, these bounds are drawn in thick dotted lines. All the SACRs are now inside, or veryslightly outside, those bounds. If we had been given the data, with no prior information, thisgraph would have given us no grounds on which to reject the simple hypothesis that the data is arealization of a GARCH white noise.

Estimating the Significance Bounds of the SACRs of a GARCH

Of course, in real situations the significance bounds depend on unknown parameters, and thuscannot be easily obtained. It is, however, possible to estimate them in a consistent way, as describedin Section 5.1.1. For a simulation of model (5.4) of size n = 5000, Figure 5.3 shows as thin dottedlines the estimation thus obtained of the significance bounds at the 5% level. The estimated boundsare fairly close to the exact asymptotic bounds.

The SPACs and Their Significance Bounds

Figure 5.4 shows the SPACs of the simulation (5.4) and the estimated significance bounds of ther(h), at the 5% level (based on

(2)rm

). By comparing Figures 5.3 and 5.4, it can be seen that the

Page 115: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 99

2 4 6 8 10 12 h

−0.06

−0.04

−0.02

0.02

0.04

0.06

Figure 5.3 Sample autocorrelations of a simulation of size n = 5000 of the GARCH(1, 1) whitenoise (5.4). Approximately 95% of the SACRs of a GARCH(1, 1) white noise should lie insidethe thin dotted lines. The exact asymptotic bounds are shown as thick dotted lines.

2 4 6 8 10 12

−0.06

−0.04

−0.02

0.02

0.04

0.06

h

Figure 5.4 Sample partial autocorrelations of a simulation of size n = 5000 of the GARCH(1, 1)white noise (5.4). Approximately 95% of the SPACs of a GARCH(1, 1) white noise should lieinside the thin dotted lines. The exact asymptotic bounds are shown as thick dotted lines.

SACRs and SPACs of the GARCH simulation look much alike. This is not surprising in view ofTheorem B.4.

Portmanteau Tests of Strong White Noise and of Pure GARCH

Table 5.1 displays p-values of white noise tests based on Qm and the usual Ljung–Box statistics,for the simulation of (5.4). Apart from the test with m = 4, the Qm tests do not reject, at the5% level, the hypothesis that the data comes from a GARCH process. On the other hand, theLjung–Box tests clearly reject the strong white noise assumption.

Portmanteau Tests Based on Partial Autocorrelations

Table 5.2 is similar to Table 5.1, but presents portmanteau tests based on the SPACs. As expected,the results are very close to those obtained for the SACRs.

An Example Showing that Portmanteau Tests Based on the SPACs Can Be More Powerfulthan those Based on the SACRs

Consider a simulation of size n = 100 of the strong MA(2) model

Xt = ηt + 0.56ηt−1 − 0.44ηt−2, ηt iid N(0, 1). (5.5)

Page 116: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

100 GARCH MODELS

Table 5.1 Portmanteau tests on a simulation of size n = 5000 of the GARCH(1, 1) white noise(5.4).

Tests based on Qm, for the hypothesis of GARCH white noise

m 1 2 3 4 5 6ρ(m) 0.00 −0.06 −0.03 0.05 −0.02 0.00σρ(m) 0.025 0.028 0.024 0.024 0.021 0.026Qm 0.00 4.20 5.49 10.19 10.90 10.94P(χ2

m >Qm) 0.9637 0.1227 0.1391 0.0374 0.0533 0.0902

m 7 8 9 10 11 12ρ(m) 0.02 −0.01 −0.02 −0.02 0.00 0.01σρ(m) 0.019 0.023 0.019 0.016 0.017 0.015Qm 12.12 12.27 13.16 14.61 14.67 15.20P(χ2

m >Qm) 0.0967 0.1397 0.1555 0.1469 0.1979 0.2306

Usual tests, for the strong white noise hypothesis

m 1 2 3 4 5 6ρ(m) 0.00 −0.06 −0.03 0.05 −0.02 0.00σρ(m) 0.014 0.014 0.014 0.014 0.014 0.014QLBm 0.01 16.78 20.59 34.18 35.74 35.86

P(χ2m >QLB

m ) 0.9365 0.0002 0.0001 0.0000 0.0000 0.0000

m 7 8 9 10 11 12ρ(m) 0.02 −0.01 −0.02 −0.02 0.00 0.01σρ(m) 0.014 0.014 0.014 0.014 0.014 0.014QLBm 38.05 38.44 39.97 41.82 41.91 42.51

P(χ2m >QLB

m ) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

By comparing the top two and bottom two parts of Table 5.3, we note that the hypotheses ofstrong white noise and pure GARCH are better rejected when the SPACs, rather than the SACRs,are used. This follows from the fact that, for this MA(2), only two theoretical autocorrelationsare not equal to 0, whereas many theoretical partial autocorrelations are far from 0. For thesame reason, the results would have been inverted if, for instance, an AR(1) alternative hadbeen considered.

5.2 Identifying the ARMA Orders of an ARMA-GARCH

Assume that the tools developed in Section 5.1 lead to rejection of the hypothesis that the data is arealization of a pure GARCH process. It is then sensible to look for an ARMA(P,Q) model withGARCH innovations. The problem is then to choose (or identify) plausible orders for the model

Xt −P∑i=1

aiXt−i = εt −Q∑i=1

biεt−i (5.6)

under standard assumptions (the AR and MA polynomials having no common root and havingroots outside the unit disk, with aP bQ = 0, Eε4

t <∞), where (εt ) is a GARCH white noise of theform (5.1).

Page 117: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 101

Table 5.2 As Table 5.1, for tests based on partial autocorrelations instead of autocorrelations.

GARCH white noise tests based on Qrm

m 1 2 3 4 5 6r(m) 0.00 −0.06 −0.03 0.05 −0.02 0.00σr(m) 0.025 0.028 0.024 0.024 0.021 0.026Qrm 0.00 4.20 5.49 9.64 10.65 10.650

P(χ2m >Qr

m) 0.9637 0.1227 0.1393 0.0470 0.0587 0.0998

m 7 8 9 10 11 12r(m) 0.02 −0.01 −0.01 −0.02 0.00 0.01σr(m) 0.019 0.023 0.019 0.016 0.017 0.015Qrm 11.92 12.24 12.77 14.24 14.24 14.67

P(χ2m >Qr

m) 0.1032 0.1407 0.1735 0.1623 0.2200 0.2599

Strong white noise tests based on Qr,LBm

m 1 2 3 4 5 6r(m) 0.02 −0.01 −0.01 −0.02 0.00 0.01σr(m) 0.014 0.014 0.014 0.014 0.014 0.014Qr,LBm 0.01 16.77 20.56 32.55 34.76 34.76

P(χ2m >Qr,LB

m ) 0.9366 0.0002 0.0001 0.0000 0.0000 0.0000

m 7 8 9 10 11 12r(m) 0.02 −0.01 −0.01 −0.02 0.00 0.01σr(m) 0.014 0.014 0.014 0.014 0.014 0.014Qr,LBm 37.12 37.94 38.84 40.71 40.71 41.20

P(χ2m >Qr,LB

m ) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

5.2.1 Sample Autocorrelations of an ARMA-GARCHRecall that an MA(Q) satisfies ρX(h) = 0 for all h>Q, whereas an AR(P ) satisfies rX(h) =0 for all h>P . The SACRs and SPACs thus play an important role in identifying the ordersP and Q.

Invalidity of the Standard Bartlett Formula and Modified Formula

The validity of the usual Bartlett formula rests on assumptions including the strong white noisehypothesis (Theorem 1.1) which are obviously incompatible with GARCH errors. We shall seethat this formula leads to underestimation of the variances of the SACRs and SPACs, and thus toerroneous ARMA orders. We shall only consider the SACRs because Theorem B.2 shows that theasymptotic behavior of the SPACs easily follows from that of the SACRs.

We assume throughout that the law of ηt is symmetric. By Theorem B.5, the asymptotic behav-ior of the SACRs is determined by the generalized Bartlett formula (B.15). This formula involvesthe theoretical autocorrelations of (Xt) and (ε2

t ), as well as the ratio κε − 1 = γε2(0)/γ 2ε (0). More

precisely, using Remark 1 of Theorem 7.2.2 in Brockwell and Davis (1991), the generalized Bartlettformula is written as

limn→∞nCov {ρX(i), ρX(j)} = vij + v∗ij ,

Page 118: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

102 GARCH MODELS

Table 5.3 White noise portmanteau tests on a simulation of size n = 100 of the MA(2)model (5.5).

Tests of GARCH white noise based on autocorrelations

m 1 2 3 4 5 6Qm 1.6090 4.5728 5.5495 6.2271 6.2456 6.4654P(χ2

m >Qm) 0.2046 0.1016 0.1357 0.1828 0.2830 0.3731

Tests of GARCH white noise based on partial autocorrelations

m 1 2 3 4 5 6Qrm 1.6090 5.8059 9.8926 16.7212 21.5870 25.3162

P(χ2m >Qr

m) 0.2046 0.0549 0.0195 0.0022 0.0006 0.0003

Tests of strong white noise based on autocorrelations

m 1 2 3 4 5 6QLBm 3.4039 8.4085 9.8197 10.6023 10.6241 10.8905

P(χ2m >QLB

m ) 0.0650 0.0149 0.0202 0.0314 0.0594 0.0918

Tests of strong white noise based on partial autocorrelations

m 1 2 3 4 5 6Qr,BPm 3.3038 10.1126 15.7276 23.1513 28.4720 32.6397

P(χ2m >Qr,BP

m ) 0.0691 0.0064 0.0013 0.0001 0.0000 0.0000

where

vij =∞∑ =1

wi( )wj ( ), v∗ij = (κε − 1)∞∑ =1

ρε2( )wi( )wj ( ), (5.7)

andwi( ) = {2ρX(i)ρX( )− ρX( + i)− ρX( − i)} .

The following result shows that the standard Bartlett formula always underestimates the asymptoticvariances of the sample autocorrelations in presence of GARCH errors.

Proposition 5.1 Under the assumptions of Theorem B.5, if the linear innovation process (εt ) is aGARCH process with ηt symmetrically distributed, then

v∗ii ≥ 0 for all i > 0.

If, moreover, α1 > 0, Var(η2t ) > 0 and

∑+∞h=−∞ ρX(h) = 0, then

v∗ii > 0 for all i > 0.

Proof. From Proposition 2.2, we have ρε2( ) ≥ 0 for all , with strict inequality when α1 > 0.It thus follows immediately from (5.7) that v∗ii ≥ 0. When α1 > 0 this inequality is strict unless ifκε = 1 or wi( ) = 0 for all ≥ 1, that is,

2ρX(i)ρX( ) = ρX( + i)+ ρX( − i).

Page 119: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 103

Suppose this relations holds and note that it is also satisfied for all ∈ Z. Moreover, summingover , we obtain

2ρX(i)∞∑−∞

ρX( ) =∞∑−∞

ρX( + i)+ ρX( − i) = 2∞∑−∞

ρX( ).

Because the sum of all autocorrelations is supposed to be nonzero, we thus have ρX(i) = 1. Taking = i in the previous relation, we thus find that ρX(2i) = 1. Iterating this argument yields ρX(ni) =1, and letting n go to infinity gives a contradiction. Finally, one cannot have κε = 1 because

Var(ε2t ) = (Eε2

t )2(κε − 1) = Eh2

t Var(η2t )+ Varht ≥ ω2Var(η2

t ) > 0.�

Consider, by way of illustration, the ARMA(2,1)-GARCH(1, 1) process defined byXt − 0.8Xt−1 + 0.8Xt−2 = εt − 0.8εt−1

εt = σtηt , ηt iid N(0, 1)σ 2t = 1+ 0.2ε2

t−1 + 0.6σ 2t−1.

(5.8)

Figure 5.5 shows the theoretical autocorrelations and partial autocorrelations for this model. Thebands shown as solid lines should contain approximately 95% of the SACRs and SPACs, for arealization of size n = 1000 of this model. These bands are obtained from formula (B.15), theautocorrelations of (ε2

t ) being computed as in Section 2.5.3. The bands shown as dotted linescorrespond to the standard Bartlett formula (still at the 95% level). It can be seen that using thisformula, which is erroneous in the presence of GARCH, would lead to identification errors becauseit systematically underestimates the variability of the sample autocorrelations (Proposition 5.1).

Algorithm for Estimating the Generalized Bands

In practice, the autocorrelations of (Xt) and (ε2t ), as well as the other theoretical quantities involved

in the generalized Bartlett formula (B.15), are obviously unknown. We propose the followingalgorithm for estimating such quantities:

1. Fit an AR(p0) model to the data X1, . . . , Xn using an information criterion for the selectionof the order p0.

2. Compute the autocorrelations ρ1(h), h = 1, 2, . . . , of this AR(p0) model.

3. Compute the residuals ep0+1, . . . , en of this estimated AR(p0).

4. Fit an AR(p1) model to the squared residuals e2p0+1, . . . , e

2n, again using an information

criterion for p1.

2 4 6 8 10 12

−0.75

−0.5

−0.25

0.25

0.5

0.75

2 4 6 8 10 12 hh

−0.8

−0.6

−0.4

−0.2

0.2

0.4

Figure 5.5 Autocorrelations (left) and partial autocorrelations (right) for model (5.8). Approxi-mately 95% of the SACRs (SPACs) of a realization of size n = 1000 should lie between the bandsshown as solid lines. The bands shown as dotted lines correspond to the standard Bartlett formula.

Page 120: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

104 GARCH MODELS

5. Compute the autocorrelations ρ2(h), h = 1, 2, . . . , of this AR(p1) model.

6. Estimate limn→∞ nCov {ρ(i), ρ(j)} by vij + v∗ij , where

vij = max∑

=− max

ρ1( )[2ρ1(i)ρ1(j)ρ1( )− 2ρ1(i)ρ1( + j)

−2ρ1(j)ρ1( + i)+ ρ1( + j − i)+ ρ1( − j − i)],

v∗ij =γε2 (0)

γ 2ε (0)

max∑ =− max

ρ2( )[2ρ1(i)ρ1(j)ρ

21 ( )− 2ρ1(j)ρ1( )ρ1( + i)

−2ρ1(i)ρ1( )ρ1( + j)+ ρ1( + i) {ρ1( + j)+ ρ1( − j)}] ,γε2 (0) = 1

n− p0

n∑t=p0+1

e4t − γ 2

ε (0), γ 2ε (0) =

1

n− p0

n∑t=p0+1

e2t ,

and max is a truncation parameter, numerically determined so as to have |ρ1( )| and |ρ2( )|less than a certain tolerance (for instance, 10−5) for all > max.

This algorithm is fast when the Durbin–Levinson algorithm is used to fit the AR models. Figure 5.6shows an application of this algorithm (using the BIC information criterion).

5.2.2 Sample Autocorrelations of an ARMA-GARCH Process Whenthe Noise is Not Symmetrically Distributed

The generalized Bartlett formula (B.15) holds under condition (B.13), which may not be satisfiedif the distribution of the noise ηt , in the GARCH equation, is not symmetric. We shall consider theasymptotic behavior of the SACVs and SACRs for very general linear processes whose innovation(εt ) is a weak white noise. Retaining the notation of Theorem B.5, the following property allowsthe asymptotic variance of the SACRs to be interpreted as the spectral density at 0 of a vectorprocess (see, for instance, Brockwell and Davis, 1991, for the concept of spectral density). Letγ0:m = (γ (0), . . . , γ (m))′.

Theorem 5.3 Let (Xt )t∈Z be a real stationary process satisfying

Xt =∞∑

j=−∞ψjεt−j ,

∞∑j=−∞

|ψj | <∞,

5 10 15 20

−0.6

−0.4

−0.2

0.2

5 10 15 20 h h

−0.6

−0.4

−0.2

0.2

0.4

Figure 5.6 SACRs (left) and SPACs (right) of a simulation of size n = 1000 of model (5.8). Thedotted lines are the estimated 95% confidence bands.

Page 121: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 105

where (εt )t∈Z is a weak white noise such that Eε4t <∞. Let ϒt = Xt(Xt , Xt+1, . . . , Xt+m)′,

�ϒ(h) = Eϒ∗t ϒ∗′t+h and

fϒ∗ (λ) := 1

+∞∑h=−∞

e−ihλ�ϒ(h),

the spectral density of the process ϒ∗ = (ϒ∗t ), ϒ∗t = ϒt −Eϒt . Then we have

limn→∞nVarγ0:m := γ0:m = 2πfϒ∗(0). (5.9)

Proof. By stationarity and application of the Lebesgue dominated convergence theorem,

nVarγ0:m + o(1) = nCov

(1

n

n∑t=1

ϒ∗t ,1

n

n∑s=1

ϒ∗s

)

=n−1∑

h=−n+1

(1− |h|

n

)Cov

(ϒ∗t , ϒ

∗t+h)

→+∞∑

h=−∞�ϒ(h) = 2πfϒ∗ (0)

as n→∞. �

The matrix γ0:m involved in (5.9) is called the long-run variance in the econometric literature,as a reminder that it is the limiting variance of a sample mean. Several methods can be consideredfor long-run variance estimation.

(i) The naive estimator based on replacing the �ϒ(h) by the �ϒ(h) in fϒ∗(0) is inconsistent(Exercise 1.2). However, a consistent estimator can be obtained by weighting the �ϒ(h),using a weight close to 1 when h is very small compared to n, and a weight close to 0when h is large. Such an estimator is called heteroscedastic and autocorrelation consistent(HAC) in the econometric literature.

(ii) A consistent estimator of fϒ∗ (0) can also be obtained using the smoothed periodogram (seeBrockwell and Davis, 1991, Section 10.4).

(iii) For a vector AR(r),

Ar (B)Yt := Yt −r∑i=1

AiYt−i = Zt , (Zt ) white noise with variance Z,

the spectral density at 0 is

fY (0) = 1

2πAr (1)

−1ZA′r (1)−1.

A vector AR model is easily fitted, even a high-order AR, using a multivariate version ofthe Durbin–Levinson algorithm (see Brockwell and Davis, 1991, p. 422). The followingmethod can thus be proposed:

1. Fit AR(r) models, with r = 0, 1 . . . , R, to the data ϒ1 −ϒn, . . . , ϒn−m −ϒn whereϒn = (n−m)−1∑n−m

t=1 ϒt .

2. Select a value r0 by minimizing an information criterion, for instance the BIC.

Page 122: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

106 GARCH MODELS

3. Takeγ0:m = Ar0(1)

−1r0 A′r0(1)−1,

with obvious notation.

In our applications we used method (iii).

5.2.3 Identifying the Orders (P, Q)

Order determination based on the sample autocorrelations and partial autocorrelations in the mixedARMA(P,Q) model is not an easy task. Other methods, such as the corner method, presented inthe next section, and the epsilon algorithm, rely on more convenient statistics.

The Corner Method

Denote by D(i, j) the j × j Toeplitz matrix

D(i, j) =

ρX(i) ρX(i − 1) · · · ρX(i − j + 1)

ρX(i + 1)...

ρX(i + j − 1) · · · ρX(i + 1) ρX(i)

and let �(i, j) denote its determinant. Since ρX(h) =

∑Pi=1 aiρX(h− i) = 0, for all h>Q, it is

clear that D(i, j) is not a full-rank matrix if i >Q and j >P . More precisely, P and Q areminimal orders (that is, (Xt) does not admit an ARMA(P ′,Q′) representation with P ′ < P orQ′ < Q) if and only if �(i, j) = 0 ∀i >Q and ∀j >P,

�(i, P ) = 0 ∀i ≥ Q,

�(Q, j) = 0 ∀j ≥ P.

(5.10)

The minimal orders P and Q are thus characterized by the following table:

(T 1)

i\j 1 2 · · · Q Q+ 1 · · · ·1 ρ1 ρ2 · · · ρq ρq+1 · · · ····P × × × × × ×

P + 1 × 0 0 0 0 0× 0 0 0 0 0× 0 0 0 0 0× 0 0 0 0 0

where �(j, i) is at the intersection of row i and column j , and × denotes a nonzero element.The orders P and Q are thus characterized by a corner of zeros in table (T 1), hence the term

‘corner method’. The entries in this table are easily obtained using the recursion on j given by

�(i, j)2 = �(i + 1, j)�(i − 1, j)+�(i, j + 1)�(i, j − 1), (5.11)

and letting �(i, 0) = 1,�(i, 1) = ρX(|i|).

Page 123: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 107

Denote by D(i, j), �(i, j), (T 1), . . . the items obtained by replacing {ρX(h)} by {ρX(h)} inD(i, j), �(i, j), (T 1), . . . . Only a finite number of SACRs ρX(1), . . . , ρX(K) are available inpractice, which allows �(j, i) to be computed for i ≥ 1, j ≥ 1 and i + j ≤ K + 1. Table (T 1)is thus triangular. Because the �(j, i) consistently estimate the �(j, i), the orders P and Q arecharacterized by a corner of small values in table (T 1). However, the notion of ‘small value’ in(T 1) is not precise enough.3

It is preferable to consider the studentized statistics defined, for i = −K, . . . , K and j =0, . . . , K − |i| + 1, by

t (i, j) = √n�(i, j)σ�(i,j)

, σ 2�(i,j)

= ∂�(i, j)

∂ρ′KρK

∂�(i, j)

∂ρK, (5.12)

where ρK is a consistent estimator of the asymptotic covariance matrix of the first K SACRs,which can be obtained by the algorithm of Section 5.2.1 or by that of Section 5.2.2, and wherethe Jacobian ∂�(i,j)

∂ρ′K=(∂�(i,j)

∂ρX(1), . . . ,

∂�(i,j)

∂ρX(K)

)is obtained from the differentiation of (5.11):

∂�(i, 0)

∂ρX(k)= 0 for i = −K − 1, . . . , K − 1 and k = 1, . . . , K;

∂�(i, 1)

∂ρX(k)= 1l{k}(|i|) for i = −K, . . . , K and k = 1, . . . , K;

∂�(i, j + 1)

∂ρX(k)=

2�(i, j) ∂�(i,j)∂ρX(k)

− �(i + 1, j) ∂�(i−1,j)∂ρX(k)

− �(i − 1, j) ∂�(i+1,j)∂ρX(k)

�(i, j − 1)

−{�(i, j)2 − �(i + 1, j)�(i − 1, j)

}∂�(i,j−1)∂ρX(k)

�2(i, j − 1)

for k = 1, . . . , K , i = −K + j, . . . , K − j and j = 1, . . . , K .When �(i, j) = 0 the statistic t (i, j) asymptotically follows a N(0, 1) (provided, in particular,

that EX4t exists). If, in contrast, �(i, j) = 0 then

√n|t (i, j)| → ∞ a.s. when n→∞. We can

reject the hypothesis �(i, j) = 0 at level α if |t (i, j)| is beyond the (1− α/2)-quantile of a N(0, 1).We can also automatically detect a corner of small values in the table, (T 1) say, giving the t (i, j),if no entry in this corner is greater than this (1− α/2)-quantile in absolute value. This practicedoes not correspond to any formal test at level α, but allows a small number of plausible valuesto be selected for the orders P and Q.

Illustration of the Corner Method

For a simulation of size n = 1000 of the ARMA(2,1)-GARCH(1, 1) model (5.8) we obtain thefollowing table:

.p.|.q..1....2....3....4....5....6....7....8....9...10...11...12...1 | 17.6-31.6-22.6 -1.9 11.5 8.7 -0.1 -6.1 -4.2 0.5 3.5 2.12 | 36.1 20.3 12.2 8.7 6.5 4.9 4.0 3.3 2.5 2.1 1.8

3 | -7.8 -1.6 -0.2 0.5 0.7 -0.7 0.8 -1.4 1.2 -1.1

4 | 5.2 0.1 0.4 0.3 0.6 -0.1 -0.3 0.5 -0.2

5 | -3.7 0.4 -0.1 -0.5 0.4 -0.2 0.2 -0.2

6 | 2.8 0.6 0.5 0.4 0.2 0.4 0.2

3 Comparing �(i, j) and �(i ′, j ′) for j = j ′ (that is, entries of different rows in table (T 1)) is all the moredifficult as these are determinants of matrices of different sizes.

Page 124: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

108 GARCH MODELS

7 | -2.0 -0.7 0.2 0.0 -0.4 -0.3

8 | 1.7 0.8 0.0 0.2 0.2

9 | -0.6 -1.2 -0.5 -0.210 | 1.4 0.9 -0.2

11 | -0.2 -1.2

12 | 1.2

A corner of values which can be viewed as plausible realizations of the N(0, 1) can be observed.This corner corresponds to the rows 3, 4, . . . and the columns 2, 3, . . . , leading us to select theARMA(2, 1) model. The automatic detection routine for corners of small values gives:

ARMA(P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 2, 8) ( 3, 1) (10, 0)0.100000 1.64 ( 2, 1) ( 8, 0)

0.050000 1.96 ( 1,10) ( 2, 1) ( 7, 0)

0.020000 2.33 ( 0,11) ( 1, 9) ( 2, 1) ( 6, 0)0.010000 2.58 ( 0,11) ( 1, 8) ( 2, 1) ( 6, 0)

0.005000 2.81 ( 0,11) ( 1, 8) ( 2, 1) ( 5, 0)

0.002000 3.09 ( 0,11) ( 1, 8) ( 2, 1) ( 5, 0)0.001000 3.29 ( 0,11) ( 1, 8) ( 2, 1) ( 5, 0)

0.000100 3.72 ( 0, 9) ( 1, 7) ( 2, 1) ( 5, 0)

0.000010 4.26 ( 0, 8) ( 1, 6) ( 2, 1) ( 4, 0)

We retrieve the orders (P,Q) = (2, 1) of the simulated model, but also other plausible orders.This is not surprising since the ARMA(2, 1) model can be well approximated by other ARMAmodels, such as an AR(6), an MA(11) or an ARMA(1, 8) (but in practice, the ARMA(2, 1) shouldbe preferred for parsimony reasons).

5.3 Identifying the GARCH Orders of an ARMA-GARCHModel

The Box–Jenkins methodology described in Chapter 1 for ARMA models can be adapted toGARCH(p, q) models. In this section we consider only the identification problem. First supposethat the observations are drawn from a pure GARCH. The choice of a small number of plausiblevalues for the orders p and q can be achieved in several steps, using various tools:

(i) inspection of the sample autocorrelations and sample partial autocorrelations of ε21 , . . . , ε

2n;

(ii) inspection of statistics that are functions of the sample autocovariances of ε2t (corner method,

epsilon algorithm, . . . );

(iii) use of information criteria (AIC, BIC, . . . );

(iv) tests of the significance of certain coefficients;

(v) analysis of the residuals.

Steps (iii) and (v), and to a large extent step (iv), require the estimation of models, and areused to validate or modify them. Estimation of GARCH models will be studied in detail in theforthcoming chapters. Step (i) relies on the ARMA representation for the square of a GARCH

Page 125: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 109

process. In particular, if (εt ) is an ARCH(q) process, then the theoretical partial autocorrelationfunction rε2(·) of (ε2

t ) satisfies

rε2 (h) = 0, ∀h>q.

For mixed models, the corner method can be used.

5.3.1 Corner Method in the GARCH CaseTo identify the orders of a GARCH(p, q) process, one can use the fact that (ε2

t ) follows anARMA(P , Q) with P = max(p, q) and Q = p. In the case of a pure GARCH, (εt ) = (Xt ) isobserved. The asymptotic variance of the SACRs of ε2

1 , . . . , ε2n can be estimated by the method

described in Section 5.2.2. The table of studentized statistics for the corner method follows, asdescribed in the previous section. The problem is then to detect at least one corner of normalvalues starting from the row P + 1 and the column Q+ 1 of the table, under the constraintsP ≥ 1 (because max(p, q) ≥ q ≥ 1) and P ≥ Q. This leads to selection of GARCH(p, q) mod-els such that (p, q) = (Q, P ) when Q < P and (p, q) = (Q, 1), (p, q) = (Q, 2), . . . ,(p, q) =(Q, P ) when Q ≥ P .

In the ARMA-GARCH case the εt are unobserved but can be approximated by the ARMAresiduals. Alternatively, to avoid the ARMA estimation, residuals from fitted ARs, as described insteps 1 and 3 of the algorithm of Section 5.2.1, can be used.

5.3.2 ApplicationsA Pure GARCH

Consider a simulation of size n = 5000 of the GARCH(2, 1) model{εt = σtηtσ 2t = ω + αε2

t−1 + β1σ2t−1 + β2σ

2t−2,

(5.13)

where (ηt ) is a sequence of iid N(0, 1) variables, ω = 1, α = 0.1, β1 = 0.05 and β2 = 0.8.The table of studentized statistics for the corner method is as follows:

.max(p,q).|.p..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...1 | 5.3 2.9 5.1 2.2 5.3 5.9 3.6 3.7 2.9 2.9 3.4 1.4 5.8 2.4 3.0

2 | -2.4 -3.5 2.4 -4.4 2.2 -0.7 0.6 -0.7 -0.3 0.4 1.1 -2.5 2.8 -0.2

3 | 4.9 2.4 0.7 1.7 0.7 -0.8 0.2 0.4 0.3 0.3 0.7 1.4 1.4

4 | -0.4 -4.3 -1.8 -0.6 1.0 -0.6 0.4 -0.4 0.5 -0.6 0.4 -1.1

5 | 4.6 2.4 0.6 0.9 0.8 0.5 0.3 -0.4 -0.5 0.5 -0.8

6 | -3.1 -1.7 1.4 -0.8 -0.3 0.3 0.3 -0.5 0.5 0.4

7 | 3.1 1.2 0.3 0.6 0.3 0.2 0.5 0.1 -0.7

8 | -1.0 -1.3 -0.7 -0.5 0.8 -0.5 0.3 -0.6

9 | 1.5 0.3 0.2 0.7 -0.5 0.5 -0.7

10 | -1.7 0.1 0.3 -0.7 -0.6 0.5

11 | 1.8 1.2 0.6 0.7 -1.0

12 | 1.6 -1.3 -1.4 -1.1

13 | 4.2 2.3 1.4

14 | -1.2 -0.6

15 | 1.4

A corner of plausible N(0, 1) values is observed starting from the row P + 1 = 3 and the columnQ+ 1 = 3, which corresponds to GARCH(p, q) models such that (max(p, q), p) = (2, 2), that

Page 126: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

110 GARCH MODELS

is, (p, q) = (2, 1) or (p, q) = (2, 2). A small number of other plausible values are detected for(p, q).

GARCH(p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 3, 1) ( 3, 2) ( 3, 3) ( 1,13)0.100000 1.64 ( 3, 1) ( 3, 2) ( 3, 3) ( 2, 4) ( 0,13)

0.050000 1.96 ( 2, 1) ( 2, 2) ( 0,13)

0.020000 2.33 ( 2, 1) ( 2, 2) ( 1, 5) ( 0,13)0.010000 2.58 ( 2, 1) ( 2, 2) ( 1, 4) ( 0,13)

0.005000 2.81 ( 2, 1) ( 2, 2) ( 1, 4) ( 0,13)

0.002000 3.09 ( 2, 1) ( 2, 2) ( 1, 4) ( 0,13)0.001000 3.29 ( 2, 1) ( 2, 2) ( 1, 4) ( 0,13)

0.000100 3.72 ( 2, 1) ( 2, 2) ( 1, 4) ( 0,13)0.000010 4.26 ( 2, 1) ( 2, 2) ( 1, 4) ( 0, 5)

An ARMA-GARCH

Let us resume the simulation of size n = 1000 of the ARMA(2, 1)-GARCH(1, 1) model (5.8). Thetable of studentized statistics for the corner method, applied to the SACRs of the observed process,was presented in Section 5.2.3. A small number of ARMA models, including the ARMA(2, 1),was selected. Let e1+p0 , . . . , en denote the residuals when an AR(p0) is fitted to the observations,the order p0 being selected using an information criterion.4 Applying the corner method again,but this time on the SACRs of the squared residuals e2

1+p0, . . . , e2

n, and estimating the covariancesbetween the SACRs by the multivariate AR spectral approximation, as described in Section 5.2.2,we obtain the following table:

.max(p,q).|.p..1....2....3....4....5....6....7....8....9...10...11...12...1 | 4.5 4.1 3.5 2.1 1.1 2.1 1.2 1.0 0.7 0.4 -0.2 0.9

2 | -2.7 0.3 -0.2 0.1 -0.4 0.5 -0.2 0.2 -0.1 0.4 -0.23 | 1.4 -0.2 0.0 -0.2 0.2 0.3 -0.2 0.1 -0.2 0.1

4 | -0.9 0.1 0.2 0.2 -0.2 0.2 0.0 -0.2 -0.1

5 | 0.3 -0.4 0.2 -0.2 0.1 0.1 -0.1 0.16 | -0.7 0.4 -0.2 0.2 -0.1 0.1 -0.1

7 | 0.0 -0.1 -0.2 0.1 -0.1 -0.28 | -0.1 0.1 -0.1 -0.2 -0.1

9 | -0.3 0.1 -0.1 -0.1

10 | 0.1 -0.2 -0.111 | -0.4 0.2

12 | -1.0

A corner of values compatible with the N(0, 1) is observed starting from row 2 and column 2,which corresponds to a GARCH(1, 1) model. Another corner can be seen below row 2, whichcorresponds to a GARCH(0, 2) = ARCH(2) model. In practice, in this identification step, at leastthese two models would be selected. The next step would be the estimation of the selected models,followed by a validation step involving testing the significance of the coefficients, examining theresiduals and comparing the models via information criteria. This validation step allows a finalmodel to be retained which can be used for prediction purposes.

4 One can also use the innovations algorithm of Brockwell and Davis (1991, p. 172) for rapid fitting ofMA models. Alternatively, one of the previously selected ARMA models, for instance the ARMA(2, 1), canbe used to approximate the innovations.

Page 127: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 111

GARCH(p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 1, 1) ( 0, 3)

0.100000 1.64 ( 1, 1) ( 0, 2)

0.050000 1.96 ( 1, 1) ( 0, 2)

0.020000 2.33 ( 1, 1) ( 0, 2)

0.010000 2.58 ( 1, 1) ( 0, 2)

0.005000 2.81 ( 0, 1)

0.002000 3.09 ( 0, 1)

0.001000 3.29 ( 0, 1)

0.000100 3.72 ( 0, 1)

0.000010 4.26 ( 0, 1)

5.4 Lagrange Multiplier Test for ConditionalHomoscedasticity

To test linear restrictions on the parameters of a model, the most widely used tests are the Waldtest, the Lagrange multiplier (LM) test and likelihood ratio (LR) test. The LM test, also referredto as the Rao test or the score test, is attractive because it only requires estimation of the restrictedmodel (unlike the Wald and LR tests which will be studied in Chapter 8), which is often mucheasier than estimating the unrestricted model. We start by deriving the general form of the LMtest. Then we present an LM test for conditional homoscedasticity in Section 5.4.2.

5.4.1 General Form of the LM TestConsider a parametric model, with true parameter value θ0 ∈ Rd , and a null hypothesis

H0 : Rθ0 = r,

where R is a given s × d matrix of full rank s, and r is a given s × 1 vector. This formulationallows one to test, for instance, whether the first s components of θ0 are null (it suffices to setR = [Is : 0s×(d−s)

]and r = 0s). Let �n(θ) denote the log-likelihood of observations X1, . . . , Xn.

We assume the existence of unconstrained and constrained (by H0) maximum likelihood estimators,respectively satisfying

θ = arg supθ

�n(θ) and θ c = arg supθ :Rθ=r

�n(θ).

Under some regularity assumptions (which will be discussed in detail in Chapter 7 for theGARCH(p, q) model) the score vector satisfies a central limit theorem and we have

1√n

∂θ�n(θ0)

L→ N (0,I) and√n(θ − θ0)

L→ N (0, I−1) , (5.14)

where I is the Fisher information matrix. To derive the constrained estimator we introduce theLagrangian

L(θ, λ) = �n(θ)− λ′(Rθ − r).

We have(θ c, λ) = arg sup

(θ,λ)

L(θ, λ).

Page 128: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

112 GARCH MODELS

The first-order conditions give

R′λ = ∂

∂θ�n(θ

c) and Rθc = r.

The second convergence in (5.14) thus shows that under H0,

√nR(θ − θ c) = √nR(θ − θ0)

L→ N (0, RI−1R′). (5.15)

Using the convention ac= b for a = b + c, asymptotic expansions entail, under usual regularity

conditions (more rigorous statements will be given in Chapter 7),

0 = 1√n

∂θ�n(θ)

oP (1)= 1√n

∂θ�n(θ0)− I

√n(θ − θ0),

1√n

∂θ�n(θ

c)oP (1)= 1√

n

∂θ�n(θ0)− I

√n(θ c − θ0),

which, by subtraction, gives

√n(θ − θ c)

oP (1)= I−1 1√n

∂θ�n(θ

c) = I−1 1√nR′λ. (5.16)

Finally, (5.15) and (5.16) imply

1√nλoP (1)= (

RI−1R′)−1√

nR(θ − θ0)L→ N

{0,(RI−1R′

)−1}

and then1√n

∂θ�n(θ

c) = R′1√nλ

L→ N{

0, R′(RI−1R′

)−1R}.

Thus, under H0, the test statistic

LMn := 1

nλ′RI−1R′λ = 1

n

∂θ ′�n(θ

c)I−1 ∂

∂θ�n(θ

c) (5.17)

asymptotically follows a χ2s , provided that I is an estimator converging in probability to I. In

general one can take

I = − 1

n

∂2�n(θc)

∂θ∂θ ′.

The critical region of the LM test at the asymptotic level α is {LMn >χ2s (1− α)}.

The Case where the LMn Statistic Takes the Form nR2

Implementation of an LM test can sometimes be extremely simple. Consider a nonlinear con-ditionally homoscedastic model in which a dependent variable Yt is related to its past valuesand to a vector of exogenous variables Xt by Yt = Fθ0(Wt )+ εt , where εt is iid (0, σ 2

0 ) andWt = (Xt , Yt−1, . . . ). Assume, in addition, that Wt and εt are independent. We wish to test thehypothesis

H0 : ψ0 = 0

where

θ0 =(

β0

ψ0

), β0 ∈ Rd−s, ψ0 ∈ Rs .

Page 129: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 113

To retrieve the framework of the previous section, let R = [0s×(d−s) : Is]

and note that

∂ψ�n(θ

c) = R∂

∂θ�n(θ

c) = RR′λ = λ and1√nλ

L→ N (0, λ) ,

where λ =(I22)−1

and I22 = RI−1R′ is the bottom right-hand block of I−1. Suppose thatσ 2

0 = σ 2(β0) does not depend on ψ0. With a Gaussian likelihood (Exercise 5.9) we have

1√nλ = 1√

nσ c 2

n∑t=1

εt (θc)

∂ψFθc (Wt ) = 1√

nσ c 2F′ψ Uc,

where εt (θ) = Yt − Fθ(Wt ), σ c 2 = σ 2(βc), εct = εt (θc), Uc = (εc1, . . . , ε

cn)′ and

F′ψ =(∂Fθc (W1)

∂ψ· · · ∂Fθc (Wn)

∂ψ

).

Partition I into blocks as

I =(

I11 I12

I21 I22

),

where I11 and I22 are square matrices of respective sizes d − s and s. Under the assumption that theinformation matrix I is block-diagonal (that is, I12 = 0), we have I22 = I−1

22 where I22 = RIR′,which entails λ = I22. We can then choose

λ = 1

σ c 4

1

nUc ′Uc 1

nF′ψFψ

oP (1)= 1

Uc ′ UcF′ψFψ

as a consistent estimator of λ. We end up with

LMn = nUc ′Fψ

(F′ψFψ

)−1F′ψ Uc

Uc ′Uc, (5.18)

which is nothing other than n times the uncentered determination coefficient in the regression ofεct on the variables ∂Fθc (Wt )/∂ψi for i = 1, . . . , s (Exercise 5.10).

LM Test with Auxiliary Regressions

We extend the previous framework by allowing I12 to be not equal to zero. Assume that σ 2 doesnot depend on θ . In view of Exercise 5.9, we can then estimate λ by5

∗λ =1

Uc ′ Uc

(F′ψFψ − F′ψFβ

(F′βFβ

)−1 F′βFψ

),

where

F′β =(∂Fθc (W1)

∂β· · · ∂Fθc (Wn)

∂β

).

Suppose the model is linear under the constraint H0, so that

Uc = Y− Fββc and σ c 2 = Uc ′ Uc/n

5 For a partitioned invertible matrix A =(

A11 A12

A21 A22

), where A11 and A22 are invertible square blocks,

the bottom right-hand block of A−1 is written as A22 =(A22 − A21A

−111 A12

)−1(Exercise 6.7).

Page 130: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

114 GARCH MODELS

withY = (Y1, . . . , Yn)

′ and βc = (F′βFβ

)−1 F′βY,

up to some negligible terms.Now consider the linear regression

Y = Fββ∗ + Fψψ

∗ + U. (5.19)

Exercise 5.10 shows that, in this auxiliary regression, the LM statistic for testing the hypothesis

H ∗0 : ψ∗ = 0

is given by

LM∗n = n−1 (σ c)−4 Uc ′Fψ

∗−1λ F′ψ Uc

= (σ c)−2 Uc ′Fψ

(F′ψFψ − F′ψFβ

(F′βFβ

)−1 F′βFψ

)−1F′ψ Uc.

This statistic is precisely the LM test statistic for the hypothesis H0 : ψ = 0 in the initial model.From Exercise 5.10, the LM test statistic of the hypothesis H ∗

0 : ψ∗ = 0 in model (5.19) can alsobe written as

LM∗n = n

Uc ′ Uc − U′UUc ′ Uc

, (5.20)

where U = Y− Fβ β∗ − Fψψ

∗ =: Y− Fθ∗, with θ∗ = (F′F)−1 F′Y. We finally obtain the so-called Breusch–Godfrey form of the LM statistic by interpreting LM∗

n in (5.20) as n times thedetermination coefficient of the auxiliary regression

Uc = Fβγ + Fψψ∗∗ + V, (5.21)

where Uc is the vector of residuals in the regression of Y on the columns of Fβ .Indeed, in the two regressions (5.19) and (5.21), the vector of residuals is V = U, because

β∗ = βc + γ and ψ∗ = ψ∗∗. Finally, we note that the determination coefficient is centered (inother words, it is R2 as provided by standard statistical software) when a column of Fβ is constant.

Quasi-LM Test

When �n(θ) is no longer supposed to be the log-likelihood, but only the quasi-log-likelihood(a thorough study of the quasi-likelihood for GARCH models will be made in Chapter 7), theequations can in general be replaced by

1√n

∂θ�n(θ0)

L→ N (0, I ) and√n(θ − θ0)

L→ N (0, J−1IJ−1) , (5.22)

where

I = limn→∞

1

nVar

∂θ�n(θ0), J = lim

n→∞1

n

∂2

∂θ∂θ ′�n(θ0) a.s.

It is then recommended that (5.17) be replaced by the more complex, but more robust, expression

LMn = 1

nλ′RJ−1R′

(RJ−1I J−1R′

)−1RJ−1R′λ

= 1

n

∂θ ′�n(θ

c)J−1R′(RJ−1I J−1R′

)−1RJ−1 ∂

∂θ�n(θ

c), (5.23)

where I and J are consistent estimators of I and J . A consistent estimator of J is obviouslyobtained as a sample mean. Estimating the long-run variance I requires more involved methods,such as those described on page 105 (HAC or other methods).

Page 131: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 115

5.4.2 LM Test for Conditional HomoscedasticityConsider testing the conditional homoscedasticity assumption

H0 : α01 = · · · = α0q = 0

in the ARCH(q) model{εt = σtηt , ηt iid (0, 1)σ 2t = ω0 +

∑q

i=1 α0iε2t−i , ω0 > 0, α0i ≥ 0.

At the parameter value θ = (ω, α1, . . . , αq) the quasi-log-likelihood is written, neglecting unim-portant constants, as

�n(θ) = −1

2

n∑t=1

{ε2t

σ 2t (θ)

+ logσ 2t (θ)

}, σ 2

t (θ) = ω +q∑i=1

αiε2t−i ,

with the convention εt−i = 0 for t ≤ 0. The constrained quasi-maximum likelihood estimator is

θ c = (ωc, 0, . . . , 0), where ωc = σ 2

t (θc) = 1

n

n∑t=1

ε2t .

6

At θ0 = (ω0, . . . , 0), the score vector satisfies

1√n

∂�n(θ0)

∂θ= 1

2√n

n∑t=1

1

σ 4t (θ0)

{ε2t − σ 2

t (θ0)} ∂σ 2

t (θ0)

∂θ

= 1

2√n

n∑t=1

1

ω0

(η2t − 1

)

1ε2t−1...

ε2t−q

L→ N (0, I ) , I = κη − 1

4ω20

(1 ω0

ω′0 I22

)under H0, where ω0 = (ω0, . . . , ω0)

′ ∈ Rq and I22 is a matrix whose diagonal elements are ω20κη

with κη = Eη4t , and whose other entries are equal to ω2

0. The bottom right-hand block of I−1 isthus

I 22 = 4ω20

κη − 1

{I22 − ω0ω

′0

}−1 = 4ω20

κη − 1

{(κη − 1)ω2

0Iq}−1 = 4

(κη − 1)2Iq . (5.24)

In addition, we have

1

n

∂2�n(θ0)

∂θ∂θ ′= 1

2n

n∑t=1

1

σ 6t (θ0)

{2ε2

t − σ 2t (θ0)

} ∂σ 2t (θ0)

∂θ

∂σ 2t (θ0)

∂θ ′

= 1

2n

n∑t=1

1

ω20

(2η2

t − 1)

1ε2t−1...

ε2t−q

(1, ε2t−1, . . . , ε

2t−q)

→ J = 2

κη − 1I, a.s.

6 Indeed, the function σ 2 �→ x/σ 2 + n log σ 2 reaches its minimum at σ 2 = x/n.

Page 132: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

116 GARCH MODELS

From (5.23), using estimators of I and J such that J = 2/(κη − 1)I , we obtain

LMn = 2

κη − 1

1

nλ′RJ−1R′λ = 1

n

∂θ ′�n(θ

c)I−1 ∂

∂θ�n(θ

c).

Using (5.24) and noting that

∂θ�n(θ

c) =(

0∂∂α

�n(θc)

),

∂α�n(θ

c) = 1

2

n∑t=1

1

ωc

(ε2t

ωc− 1

) ε2t−1...

ε2t−q

,

we obtain

LMn = 1

n

∂α′�n(θ

c)I 22 ∂

∂α�n(θ

c) = 1

n

q∑h=1

{1

κη − 1

n∑t=1

(ε2t

ωc− 1

)ε2t−hωc

}2

. (5.25)

Equivalence with a Portmanteau Test

Usingn∑t=1

(ε2t

ωc− 1

)= 0 and

1

n

n∑t=1

(ε2t

ωc− 1

)2

= κη − 1,

it follows from (5.25) that

LMn = n

q∑h=1

ρ2ε2(h), (5.26)

which shows that the LM test is equivalent to a portmanteau test on the squares.

Expression in Terms of R2

To establish a connection with the linear model, write

∂θ�n(θ

c) = n−1X′Y,

where Y is the n× 1 vector 1− ε2t /ω

c, and X is the n× (q + 1) matrix with first column 1/2ωc

and (i + 1)th column ε2t−i/2ωc. Estimating I by (κη − 1)n−1X′X, where κη − 1 = n−1Y ′Y , we

obtain

LMn = nY ′X(X′X)−1X′Y

Y ′Y, (5.27)

which can be interpreted as n times the determination coefficient in the linear regression of Yon the columns of X. Because the determination coefficient is invariant by linear transformationof the variables (Exercise 5.11), we simply have LMn = nR2 where R2 is the determinationcoefficient7 of the regression of ε2

t on a constant and q lagged variables ε2t−1, . . . , ε

2t−q . Under the

null hypothesis of conditional homoscedasticity, LMn asymptotically follows a χ2q . The version of

the LM statistic given in (5.27) differs from the one given in (5.25) because (5.24) is not satisfiedwhen I is replaced by (κη − 1)n−1X′X.

7 We mean here the centered determination coefficient (the one usually given by standard software) not theuncentered one as was the case in Section 5.4.1. There is sometimes confusion between these coefficients inthe literature.

Page 133: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 117

5.5 Application to Real Series

Consider the returns of the CAC 40 stock index from March 2, 1990 to December 29, 2006 (4245observations) and of the FTSE 100 index of the London Stock Exchange from April 3, 1984to April 3, 2007 (5812 observations). The correlograms for the returns and squared returns aredisplayed in Figure 5.7. The bottom correlograms of Figure 5.7, as well as the portmanteau testsof Table 5.4, clearly show that, for the two indices, the strong white noise assumption cannotbe sustained. These portmanteau tests can be considered as versions of LM tests for conditionalhomoscedasticity (see Section 5.4.2). Table 5.5 displays the nR2 version of the LM test of Section5.4.2. Note that the two versions of the LM statistic are quite different but lead to the sameunambiguous conclusions: the hypothesis of no ARCH effect must be rejected, as well as thehypothesis of absence of autocorrelation for the CAC 40 or FTSE 100 returns.

The first correlogram of Figure 5.7 and the first part of Table 5.6 lead us to think that theCAC 40 series is fairly compatible with a weak white noise structure (and hence with a GARCHstructure). Recall that the 95% significance bands, shown as dotted lines on the upper correlogramsof Figure 5.7, are valid under the strong white noise assumption but may be misleading for weakwhite noises (such as GARCH). The second part of Table 5.6 displays classical Ljung–Box testsfor noncorrelation. It may be noted that the CAC 40 returns series does not pass the classicalportmanteau tests.8 This does not mean, however, that the white noise assumption should be

0 10 155 20 25 30 35

0 10 155 20 25 30 35

0.0

0.1

0.2

0.3

0.4

Aut

ocor

rela

tions

0.0

0.1

0.2

0.3

0.4

Aut

ocor

rela

tions

CAC Returns

0 10 20 30

FTSE Returns

0.0

0.1

0.2

0.3

0.4

Aut

ocor

rela

tions

0.0

0.1

0.2

0.3

0.4

Aut

ocor

rela

tions

Squared CAC Returns

0 10 20 30

Squared FTSE Returns

Figure 5.7 Correlograms of returns and squared returns of the CAC 40 index (March 2, 1990 toDecember 29, 2006) and the FTSE 100 index (April 3, 1984 to April 3, 2007).

8 Classical portmanteau tests are those provided by standard commercial software, in particular those ofthe table entitled ‘Autocorrelation Check for White Noise’ of the ARIMA procedure in SAS.

Page 134: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

118 GARCH MODELS

Table 5.4 Portmanteau tests on the squared CAC 40 returns (March 2, 1990 to December 29,2006) and FTSE 100 returns (April 3, 1984 to April 3, 2007).

Tests for noncorrelation of the squared CAC 40

m 1 2 3 4 5 6ρε2(m) 0.181 0.226 0.231 0.177 0.209 0.236σρ

ε2 (m) 0.030 0.030 0.030 0.030 0.030 0.030QLBm 138.825 356.487 580.995 712.549 896.465 1133.276

p-value 0.000 0.000 0.000 0.000 0.000 0.000

m 7 8 9 10 11 12ρε2(m) 0.202 0.206 0.184 0.198 0.201 0.173σρ

ε2 (m) 0.030 0.030 0.030 0.030 0.030 0.030QLBm 1307.290 1486.941 1631.190 1798.789 1970.948 2099.029

p-value 0.000 0.000 0.000 0.000 0.000 0.000

Tests for noncorrelation of the squared FTSE 100

m 1 2 3 4 5 6ρε2(m) 0.386 0.355 0.194 0.235 0.127 0.161σρ

ε2 (m) 0.026 0.026 0.026 0.026 0.026 0.026QLBm 867.573 1601.808 1820.314 2141.935 2236.064 2387.596

p-value 0.000 0.000 0.000 0.000 0.000 0.000

m 7 8 9 10 11 12ρε2(m) 0.160 0.151 0.115 0.148 0.141 0.135σρ

ε2 (m) 0.030 0.030 0.030 0.030 0.030 0.030QLBm 964.803 1061.963 1118.258 1211.899 1296.512 1374.324

p-value 0.000 0.000 0.000 0.000 0.000 0.000

Table 5.5 LM tests for conditional homoscedasticity of the CAC 40 and FTSE 100.

Tests for absence of ARCH for the CAC 40

m 1 2 3 4 5 6 7 8 9LMn 138.7 303.3 421.7 451.7 500.8 572.4 600.3 621.6 629.7p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Tests for absence of ARCH for the FTSE 100

m 1 2 3 4 5 6 7 8 9LMn 867.1 1157.3 1157.4 1220.8 1222.4 1236.6 1237.0 1267.0 1267.3p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

rejected. Indeed, we know that such classical portmanteau tests are invalid for conditionally hetero-scedastic series.

Table 5.7 is the analog of Table 5.6 for the FTSE 100 index. Conclusions are more disputablein this case. Although some p-values of the upper part of Table 5.7 are slightly less than 5%,one cannot exclude the possibility that the FTSE 100 index is a weak (GARCH) white noise.

Page 135: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 119

Table 5.6 Portmanteau tests on the CAC 40 (March 2, 1990 to December 29, 2006).

Tests of GARCH white noise based on Qm

m 1 2 3 4 5 6 7 8ρ(m) 0.016 −0.020 −0.045 0.015 −0.041 −0.023 −0.025 0.014σρ(m) 0.041 0.044 0.044 0.041 0.043 0.044 0.042 0.043Qm 0.587 1.431 5.544 6.079 9.669 10.725 12.076 12.475p-value 0.443 0.489 0.136 0.193 0.085 0.097 0.098 0.131

m 9 10 11 12 13 14 15 16ρ(m) 0.000 0.011 0.010 −0.014 0.020 0.024 0.037 0.001σρ(m) 0.041 0.042 0.042 0.041 0.043 0.040 0.040 0.040Qm 12.476 12.718 12.954 13.395 14.214 15.563 18.829 18.833p-value 0.188 0.240 0.296 0.341 0.359 0.341 0.222 0.277

Usual tests for strong white noise

m 1 2 3 4 5 6 7 8ρ(m) 0.016 −0.020 −0.045 0.015 −0.041 −0.023 −0.025 0.014σρ(m) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030QLBm 1.105 2.882 11.614 12.611 19.858 22.134 24.826 25.629

p-value 0.293 0.237 0.009 0.013 0.001 0.001 0.001 0.001

m 9 10 11 12 13 14 15 16ρ(m) 0.000 0.011 0.010 −0.014 0.020 0.024 0.037 0.001σρ(m) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030QLBm 25.629 26.109 26.579 27.397 29.059 31.497 37.271 37.279

p-value 0.002 0.004 0.005 0.007 0.006 0.005 0.001 0.002

Table 5.7 Portmanteau tests on the FTSE 100 (April 3, 1984 to April 3, 2007).

Tests of GARCH white noise based on Qm

m 1 2 3 4 5 6 7 8ρ(m) 0.023 −0.002 −0.059 0.041 −0.021 −0.021 −0.006 0.039σρ(m) 0.057 0.055 0.044 0.047 0.039 0.042 0.037 0.042Qm 0.618 0.624 7.398 10.344 11.421 12.427 12.527 15.796p-value 0.432 0.732 0.060 0.035 0.044 0.053 0.085 0.045

m 9 10 11 12 13 14 15 16ρ(m) 0.029 0.000 0.019 −0.003 0.023 −0.013 0.019 −0.022σρ(m) 0.036 0.041 0.038 0.037 0.037 0.036 0.035 0.039Qm 18.250 18.250 19.250 19.279 20.700 21.191 22.281 23.483p-value 0.032 0.051 0.057 0.082 0.079 0.097 0.101 0.101

Usual tests for strong white noise

m 1 2 3 4 5 6 7 8ρ(m) 0.023 −0.002 −0.059 0.041 −0.021 −0.021 −0.006 0.039σρ(m) 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026QLBm 3.019 3.047 23.053 32.981 35.442 38.088 38.294 47.019

p-value 0.082 0.218 0.000 0.000 0.000 0.000 0.000 0.000

m 9 10 11 12 13 14 15 16ρ(m) 0.029 0.000 0.019 −0.003 0.023 −0.013 0.019 −0.022σρ(m) 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026QLBm 51.874 51.874 54.077 54.139 57.134 58.098 60.173 62.882

p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Page 136: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

120 GARCH MODELS

Table 5.8 Studentized statistics for the corner method for the CAC 40 series and selectedARMA orders.

.p.|.q..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...1 | 0.8 -0.9 -2.0 0.7 -1.9 -1.0 -1.2 0.6 0.0 0.5 0.5 -0.7 0.9 1.2 1.8

2 | 0.9 0.8 1.1 -1.1 1.1 -0.3 0.8 0.2 -0.4 0.2 0.4 0.0 0.7 -0.1

3 | -2.0 1.1 -0.9 -1.0 -0.6 0.8 -0.5 0.4 -0.1 0.3 0.3 0.4 0.5

4 | -0.8 -1.1 1.0 -0.4 0.7 -0.5 0.2 0.4 0.4 0.3 -0.2 -0.3

5 | -2.0 1.1 -0.6 0.7 -0.6 0.3 -0.3 0.2 0.0 0.3 0.3

6 | 1.0 -0.3 -0.8 -0.5 -0.3 0.2 0.3 0.1 -0.2 0.3

7 | -1.1 0.7 -0.4 0.2 -0.3 0.3 -0.3 0.3 -0.3

8 | -0.4 0.0 -0.3 0.3 -0.1 -0.1 -0.3 0.4

9 | -0.1 -0.2 -0.1 0.3 -0.1 -0.1 -0.3

10 | -0.4 0.2 -0.3 0.2 -0.3 0.3

11 | 0.5 0.4 0.2 -0.1 0.2

12 | 0.8 0.1 -0.3 -0.3

13 | 1.0 0.8 0.5

14 | -1.1 -0.2

15 | 1.8

ARMA(P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 1, 1)

0.100000 1.64 ( 1, 1)

0.050000 1.96 ( 0, 3) ( 1, 1) ( 5, 0)

0.020000 2.33 ( 0, 0)

0.010000 2.58 ( 0, 0)

0.005000 2.81 ( 0, 0)

0.002000 3.09 ( 0, 0)

0.001000 3.29 ( 0, 0)

0.000100 3.72 ( 0, 0)

0.000010 4.26 ( 0, 0)

On the other hand, the assumption of strong white noise can be categorically rejected, thep-values (bottom of Table 5.7) being almost equal to zero. Table 5.8 confirms the identificationof an ARMA(0, 0) process for the CAC 40. Table 5.9 would lead us to select an ARMA(0, 0),ARMA(1, 1), AR(3) or MA(3) model for the FTSE 100. Recall that this a priori identificationstep should be completed by an estimation of the selected models, followed by a validationstep. For the CAC 40, Table 5.10 indicates that the most reasonable GARCH model is simplythe GARCH(1, 1). For the FTSE 100, plausible models are the GARCH(2, 1), GARCH(2, 2),GARCH(2, 3), or ARCH(4), as can be seen from Table 5.11. The choice between these modelsis the object of the estimation and validation steps.

5.6 Bibliographical Notes

In this chapter, we have adapted tools generally employed to deal with the identification ofARMA models. Correlograms and partial correlograms are studied in depth in the book byBrockwell and Davis (1991). In particular, they provide a detailed proof for the Bartlett for-mula giving the asymptotic behavior of the sample autocorrelations of a strong linear process.

Page 137: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 121

Table 5.9 Studentized statistics for the corner method for the FTSE 100 series and selectedARMA orders.

.p.|.q..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...1 | 0.8 -0.1 -2.6 1.7 -1.0 -1.0 -0.3 1.8 1.6 0.0 1.1 -0.2 1.3 -0.7 1.2

2 | 0.1 0.8 1.2 0.2 1.0 0.3 0.9 1.0 0.6 -0.9 0.5 -0.8 0.6 -0.4

3 | -2.6 1.2 -0.7 -0.6 -0.7 0.8 -0.4 0.5 0.7 0.3 -0.3 -0.1 0.2

4 | -1.8 0.3 0.6 -0.7 0.6 0.0 -0.4 0.4 0.6 -0.3 0.1 -0.1

5 | -1.1 1.1 -0.7 0.6 -0.6 0.5 -0.3 0.5 0.5 0.1 0.2

6 | 1.1 0.5 -0.8 0.2 -0.4 0.6 0.5 0.5 0.4 0.2

7 | 0.0 0.9 -0.2 -0.3 0.0 0.5 0.5 0.4 0.3

8 | -1.6 0.7 -0.3 0.2 -0.4 0.4 -0.4 0.3

9 | 1.4 0.5 0.6 0.5 0.4 0.3 0.2

10 | 0.0 -0.9 -0.4 -0.2 -0.1 0.0

11 | 1.2 0.6 0.0 0.0 0.1

12 | 0.2 -0.8 0.0 0.0

13 | 1.3 0.6 0.1

14 | 0.5 -0.6

15 | 1.1

ARMA(P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 0,13) ( 1, 1) ( 9, 0)

0.100000 1.64 ( 0, 8) ( 1, 1) ( 4, 0)

0.050000 1.96 ( 0, 3) ( 1, 1) ( 3, 0)

0.020000 2.33 ( 0, 3) ( 1, 1) ( 3, 0)

0.010000 2.58 ( 0, 3) ( 1, 1) ( 3, 0)

0.005000 2.81 ( 0, 0)

0.002000 3.09 ( 0, 0)

0.001000 3.29 ( 0, 0)

0.000100 3.72 ( 0, 0)

0.000010 4.26 ( 0, 0)

The generalized Bartlett formula (B.15) was established by Francq and Zakoıan (2009d). Thetextbook by Li (2004) can serve as a reference for the various portmanteau adequacy tests, aswell as Godfrey (1988) for the LM tests. It is now well known that tools generally used forthe identification of ARMA models should not be directly used in presence of conditional het-eroscedasticity, or other forms of dependence in the linear innovation process (see, for instance,Diebold, 1986; Romano and Thombs, 1996; Berlinet and Francq, 1997; or Francq, Roy andZakoıan, 2005). The corner method was proposed by Beguin, Gourieroux and Monfort (1980)for the identification of mixed ARMA models. There are many alternatives to the corner method,in particular the epsilon algorithm (see Berlinet, 1984) and the generalized autocorrelations ofGlasbey (1982).

Additional references on tests of ARCH effects are Engle (1982, 1984), Bera and Higgins(1997) and Li (2004).

In this chapter we have assumed the existence of a fourth-order moment for the observedprocess. When only the second-order moment exists, Basrak, Davis and Mikosch (2002)showed in particular that the sample autocorrelations converge very slowly. When even thesecond-order moment does not exist, the sample autocorrelations have a degenerate asymptoticdistribution.

Page 138: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

122 GARCH MODELS

Table 5.10 Studentized statistics for the corner method for the squared CAC 40 series andselected GARCH orders.

.max(p,q).|.p..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...1 | 5.2 5.4 5.0 5.3 4.6 4.7 5.4 4.6 4.5 4.1 3.2 3.9 3.7 5.2 3.9

2 | -4.6 0.6 0.9 -1.4 0.2 1.0 -0.4 0.5 -0.6 0.2 0.4 -1.0 1.3 -0.6

3 | 3.5 0.9 0.8 0.9 0.7 0.5 -0.2 -0.4 0.4 0.4 0.5 0.9 0.8

4 | -4.0 -1.5 -0.9 -0.4 0.0 0.4 -0.4 0.2 -0.3 -0.2 0.2 0.3

5 | 4.2 0.2 0.8 0.1 0.3 0.3 0.3 0.3 0.2 0.2 0.2

6 | -5.1 1.1 -0.6 0.4 -0.3 0.3 0.4 -0.2 -0.1 0.2

7 | 2.5 -0.3 -0.4 -0.4 0.3 0.4 0.2 0.1 0.2

8 | -3.5 0.5 0.3 0.3 -0.3 -0.1 -0.1 0.2

9 | 1.4 -0.9 0.4 -0.3 0.3 -0.1 0.1

10 | -3.4 0.3 -0.5 -0.2 -0.2 0.2

11 | 1.5 0.4 0.5 0.2 0.2

12 | -2.4 -1.0 -0.9 0.3

13 | 3.7 1.9 0.9

14 | -0.6 -0.1

15 | 0.1

GARCH(p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 2, 1) ( 2, 2) ( 0,13)

0.100000 1.64 ( 2, 1) ( 2, 2) ( 0,13)

0.050000 1.96 ( 1, 1) ( 0,13)

0.020000 2.33 ( 1, 1) ( 0,13)

0.010000 2.58 ( 1, 1) ( 0,13)

0.005000 2.81 ( 1, 1) ( 0,13)

0.002000 3.09 ( 1, 1) ( 0,13)

0.001000 3.29 ( 1, 1) ( 0,13)

0.000100 3.72 ( 1, 1) ( 0, 6)

0.000010 4.26 ( 1, 1) ( 0, 6)

Concerning the HAC estimators of a long-run variance matrix, see, for instance, Andrews(1991) and Andrews and Monahan (1992). The method based on the spectral density at 0 of anAR model follows from Berk (1974). A comparison with the HAC method is proposed in denHann and Levin (1997).

5.7 Exercises

5.1 (Asymptotic behavior of the SACVs of a martingale difference)Let (εt ) denote a martingale difference sequence such that Eε4

t <∞ and γ (h) =n−1∑n

t=1 εtεt+h. By applying Corollary A.1, derive the asymptotic distribution of n1/2γ (h)

for h = 0.

5.2 (Asymptotic behavior of n1/2γ (1) for an ARCH(1) process)Consider the stationary nonanticipative solution of an ARCH(1) process{

εt = σtηtσ 2t = ω + αε2

t−1,(5.28)

Page 139: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 123

Table 5.11 Studentized statistics for the corner method for the squared FTSE 100 series andselected GARCH orders.

.max(p,q).|.p..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...1 | 5.7 11.7 5.8 12.9 2.2 3.8 2.5 2.9 2.8 3.9 2.3 2.9 1.9 3.6 2.3

2 | -5.2 3.3 -2.9 2.2 -4.6 4.3 -1.9 1.5 -1.7 2.7 -1.0 -0.2 0.3 -0.2

3 | -0.1 -7.7 1.3 -0.2 0.6 -2.3 0.5 0.3 0.6 1.6 0.5 0.3 0.2

4 | -8.5 4.2 -0.1 -0.4 -0.1 1.2 -0.3 -0.7 -0.3 1.7 0.1 0.1

5 | -0.3 -1.6 0.5 -0.2 0.6 -0.9 0.7 -0.2 0.8 1.4 -0.1

6 | -1.9 1.6 0.6 1.4 0.9 0.4 -0.7 0.9 -1.4 1.2

7 | 0.7 -1.0 -1.0 -0.8 0.3 -0.6 0.5 0.6 1.1

8 | -1.2 0.7 -0.3 0.5 -0.6 0.7 -0.8 -0.5

9 | -0.3 -1.0 0.5 0.1 -1.3 -0.4 1.1

10 | -1.6 1.2 -0.8 0.9 -0.9 1.1

11 | 0.6 0.7 0.7 0.2 1.1

12 | 1.8 -0.4 -0.9 -1.2

13 | 1.2 0.9 0.8

14 | 0.3 -0.9

15 | 0.8

GARCH(p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 1, 6) ( 0,12)

0.100000 1.64 ( 1, 4) ( 0,12)

0.050000 1.96 ( 2, 3) ( 0, 4)

0.020000 2.33 ( 2, 1) ( 2, 2) ( 0, 4)

0.010000 2.58 ( 2, 1) ( 2, 2) ( 0, 4)

0.005000 2.81 ( 2, 1) ( 2, 2) ( 0, 4)

0.002000 3.09 ( 2, 1) ( 2, 2) ( 0, 4)

0.001000 3.29 ( 2, 1) ( 2, 2) ( 0, 4)

0.000100 3.72 ( 2, 1) ( 2, 2) ( 0, 4)

0.000010 4.26 ( 2, 1) ( 2, 2) ( 1, 3) ( 0, 4)

where (ηt ) is a strong white noise with unit variance and µ4α2 < 1 with µ4 = Eη4

t . Derivethe asymptotic distribution of n1/2γ (1).

5.3 (Asymptotic behavior of n1/2ρ(1) for an ARCH(1) process)For the ARCH(1) model of Exercise 5.2, derive the asymptotic distribution of n1/2ρ(1). Whatis the asymptotic variance of this statistic when α = 0? Draw this asymptotic variance as afunction of α and conclude accordingly.

5.4 (Asymptotic behavior of the SACRs of a GARCH(1, 1) process)For the GARCH(1, 1) model of Exercise 2.8, derive the asymptotic distribution of n1/2ρ(h),for h = 0 fixed.

5.5 (Moment of order 4 of a GARCH(1, 1) process)For the GARCH(1, 1) model of Exercise 2.8, compute Eεtεt+1εsεs+2.

5.6 (Asymptotic covariance between the SACRs of a GARCH(1, 1) process)For the GARCH(1, 1) model of Exercise 2.8, compute

Cov{n1/2ρ(1), n1/2ρ(2)

}.

Page 140: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

124 GARCH MODELS

5.7 (First five SACRs of a GARCH(1, 1) process)Evaluate numerically the asymptotic variance of the vector

√nρ5 of the first five SACRs of

the GARCH(1, 1) model defined by{εt = σtηt , ηt iid N(0, 1)σ 2t = 1+ 0.3ε2

t−1 + 0.55σ 2t−1.

5.8 (Generalized Bartlett formula for an MA(q)-ARCH(1) process)Suppose that Xt follows an MA(q) of the form

Xt = εt −q∑i=1

biεt−i ,

where the error term is an ARCH(1) process

εt = σtηt , σ 2t = ω + αε2

t−1, ηt iid N(0, 1), α2 < 1/3.

How is the generalized Bartlett formula (B.15) expressed for i = j >q?

5.9 (Fisher information matrix for dynamic regression model)In the regression model Yt = Fθ0 (Wt )+ εt introduced on page 112, suppose that (εt ) is aN(0, σ 2

0 ) white noise. Suppose also that the regularity conditions entailing (5.14) hold. Givean explicit form to the blocks of the matrix I , and consider the case where σ 2 does notdepend on θ .

5.10 (LM tests in a linear regression model)Consider the regression model

Y = X1β1 +X2β2 + U,

where Y = (Y1, . . . , Yn) is the dependent vector variable, Xi is an n× ki matrix of explicativevariables with rank ki (i = 1, 2), and the vector U is a N(0, σ 2In) error term. Derive the LMtest of the hypothesis H0 : β2 = 0. Consider the case X′1X2 = 0 and the general case.

5.11 (Centered and uncentered R2)Consider the regression model

Yt = β1Xt1 + · · · + βkXtk + εt , t = 1, . . . , n,

where the εt are iid, centered, and have a variance σ 2 > 0. Let Y = (Y1, . . . , Yn)′ be the

vector of dependent variables, X = (Xij ) the n× k matrix of explanatory variables, ε =(ε1, . . . , εn)

′ the vector of the error terms and β = (β1, . . . , βk)′ the parameter vector. Let

PX = X(X′X)−1X′ denote the orthogonal projection matrix on the vector subspace generatedby the columns of X.The uncentered determination coefficient is defined by

R2nc =

∥∥Y∥∥2

‖Y‖2 , Y = PXY (5.29)

and the (centered) determination coefficient is defined by

R2 =∥∥Y − ye

∥∥2

‖Y − ye‖2 , e = (1, . . . , 1)′, y = 1

n

n∑t=1

Yt . (5.30)

Page 141: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

IDENTIFICATION 125

Let T denote a k × k invertible matrix, c a number different from 0 and d any number.Let Y = cY + de and X = XT . Show that if y = 0 and if e belongs to the vector subspacegenerated by the columns of X, then R2

nc defined by (5.29) is equal to the determinationcoefficient in the regression of Y on the columns of X.

5.12 (Identification of the DAX and the S&P 500)From the address http://fr.biz.yahoo.com//bourse/accueil.html download theseries of DAX and S&P 500 stock indices. Carry out a study similar to that of Section 5.5and deduce a selection of plausible models.

Page 142: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 143: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

6

Estimating ARCH Modelsby Least Squares

The simplest estimation method for ARCH models is that of ordinary least squares (OLS). Thisestimation procedure has the advantage of being numerically simple, but has two drawbacks: (i) theOLS estimator is not efficient and is outperformed by methods based on the likelihood or on thequasi-likelihood that will be presented in the next chapters; (ii) in order to provide asymptoticallynormal estimators, the method requires moments of order 8 for the observed process. An extensionof the OLS method, the feasible generalized least squares (FGLS) method, suppresses the firstdrawback and attenuates the second by providing estimators that are asymptotically as accurateas the quasi-maximum likelihood under the assumption that moments of order 4 exist. Note thatthe least-squares methods are of interest in practice because they provide initial estimators for theoptimization procedure that is used in the quasi-maximum likelihood method.

We begin with the unconstrained OLS and FGLS estimators. Then, in Section 6.3, we will seehow to take into account positivity constraints on the parameters.

6.1 Estimation of ARCH(q) models by OrdinaryLeast Squares

In this section, we consider the OLS estimator of the ARCH(q) model:

εt = σtηt ,

σ 2t = ω0 +

q∑i=1

α0iε2t−i with ω0 > 0, α0i ≥ 0, i = 1, . . . , q, (6.1)

(ηt ) is an iid sequence, E(ηt ) = 0, Var(ηt ) = 1.

The OLS method uses the AR representation on the squares of the observed process. No assumptionis made on the law of ηt .

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 144: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

128 GARCH MODELS

The true value of the vector of the parameters is denoted by θ0 = (ω0, α01, . . . , α0q)′ and we

denote by θ a generic value of the parameter.From (6.1) we obtain the AR(q) representation

ε2t = ω0 +

q∑i=1

α0iε2t−i + ut , (6.2)

where ut = ε2t − σ 2

t = (η2t − 1)σ 2

t . The sequence (ut ,Ft )t constitutes a martingale difference whenEε2

t = σ 2t <∞, denoting by Ft the σ -field generated by {εs : s ≤ t}.

Assume that we observe ε1, . . . , εn, a realization of length n of the process (εt ), and letε0, . . . , ε1−q be initial values. For instance, the initial values can be chosen equal to zero. Intro-ducing the vector

Z′t−1 =(1, ε2

t−1, . . . , ε2t−q),

in view of (6.2) we obtain the system

ε2t = Z′t−1θ0 + ut , t = 1, . . . , n, (6.3)

which can be written asY = Xθ0 + U,

with the n× q matrix

X =

1 ε20 . . . ε2

−q+1...

1 ε2n−1 . . . ε2

n−q

= Z′0

...

Z′n−1

and the n× 1 vectors

Y =

ε21...

ε2n

, U =

u1...

un

.

Assume that the matrix X′X is invertible, or equivalently that X has full column rank (we willsee that this is always the case asymptotically, and thus for n large enough). The OLS estimatorof θ0 follows:

θn := arg minθ‖Y −Xθ‖2 = (X′X)−1X′Y. (6.4)

Under assumptions OLS1 and OLS2 below the variance of ut exists and is constant. The OLSestimator of σ 2

0 = Var(ut ) is

σ 2 = 1

n− q − 1‖Y −Xθn‖2 = 1

n− q − 1

n∑t=1

{ε2t − ω −

q∑i=1

αiε2t−i

}2

.

Remark 6.1 (OLS estimator of a GARCH model) An OLS estimator can also be defined fora GARCH (p, q) model, but the estimator is not explicit, because ε2

t does not satisfy an AR modelwhen p = 0 (see Exercise 7.5).

To establish the consistency of the OLS estimators of θ0 and σ 20 , we must consider the following

assumptions.

Page 145: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ESTIMATING ARCH MODELS BY LEAST SQUARES 129

OLS1: (εt ) is the nonanticipative strictly stationary solution of model (6.1), and ω0 > 0.

OLS2: Eε4t < +∞.

OLS3: P(η2t = 1) = 1.

Explicit conditions for assumptions OLS1 and OLS2 were given in Chapter 2. AssumptionOLS3 that the law of ηt is nondegenerate allows us to identify the parameters. The assumptionalso guarantees the invertibility of X′X for n large enough.

Theorem 6.1 (Consistency of the OLS estimator of an ARCH model) Let (θn) be a sequenceof estimators satisfying (6.4). Under assumptions OLS1–OLS3, almost surely

θn → θ0, σ 2n → σ 2

0 , as n→∞.

Proof. The proof consists of several steps.

(i) We have seen (Theorem 2.4) that (εt ), the unique nonanticipative stationary solution of themodel, is ergodic. The process (Zt ) is also ergodic because Zt is a measurable function of {εt−i , i ≥0}. The ergodic theorem (see Theorem A.2) then entails that

1

nX′X = 1

n

n∑t=1

Zt−1Z′t−1 → E(Zt−1Z

′t−1), a.s., as n→∞. (6.5)

The existence of the expectation is guaranteed by assumption OLS3. Note that the initial valuesare involved only in a fixed number of terms of the sum, and thus they do not matter for theasymptotic result. Similarly, we have

1

nX′U = 1

n

n∑t=1

Zt−1ut → E(Zt−1ut ), a.s., as n→∞.

(ii) The invertibility of the matrix EZt−1Z′t−1 = EZtZ

′t is shown by contradiction. Assume that

there exists a nonzero vector c of Rq+1 such that c′EZtZ′t c = 0. Thus E{c′Zt }2 = 0, and it follows

that c′Zt = 0 a.s. Therefore, there exists a linear combination of the variables ε2t , . . . , ε

2t−q+1

which is a.s. equal to a constant. Without loss of generality, one can assume that, in this linearcombination, the coefficient of ε2

t = η2t σ

2t is 1. Thus η2

t is a.s. a measurable function of the variablesεt−1, . . . , εt−q . However, the solution being nonanticipative, η2

t is independent of these variables.This implies that η2

t is a.s. equal to a constant. This constant is necessarily equal to 1, but thisleads to a contradiction with OLS3. Thus E(Zt−1Z

′t−1) is invertible.

(iii) The innovation of ε2t being ut = ε2

t − σ 2t = ε2

t − E(ε2t | Ft−1), we have the orthogonality

relationsE(ut) = E(utε

2t−1) = . . . = E(ut ε

2t−q) = 0

that isE(Zt−1ut ) = 0.

(iv) Point (ii) shows that n−1X′X is a.s. invertible, for n large enough and that, almost surely, asn→∞,

θn − θ0 =(X′Xn

)−1X′Un

→ {E(Zt−1Z

′t−1)

}−1E(Zt−1ut ) = 0.

Page 146: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

130 GARCH MODELS

For the asymptotic normality of the OLS estimator, we need the following additional assumption.

OLS4: E(ε8t ) < +∞.

Consider the (q + 1)× (q + 1) matrices

A = E(Zt−1Z′t−1), B = E(σ 4

t Zt−1Z′t−1).

The invertibility of A was established in the proof of Theorem 6.1, and the invertibility of B isshown by the same argument, noting that c′σ 2

t Zt−1 = 0 if and only if c′Zt−1 = 0 because σ 2t > 0

a.s. The following result establishes the asymptotic normality of the OLS estimator.Let κη = Eη4

t .

Theorem 6.2 (Asymptotic normality of the OLS estimator) Under assumptions OLS1–OLS4,

√n(θn − θ0)

L→ N(0, (κη − 1)A−1BA−1).

Proof. In view of (6.3), we have

θn =(

1

n

n∑t=1

Zt−1Z′t−1

)−1 (1

n

n∑t=1

Zt−1ε2t

)

=(

1

n

n∑t=1

Zt−1Z′t−1

)−1 {1

n

n∑t=1

Zt−1(Z′t−1θ0 + ut )

}

= θ0 +(

1

n

n∑t=1

Zt−1Z′t−1

)−1 {1

n

n∑t=1

Zt−1ut

}.

Thus

√n(θn − θ0) =

(1

n

n∑t=1

Zt−1Z′t−1

)−1 {1√n

n∑t=1

Zt−1ut

}. (6.6)

Let λ ∈ Rq+1, λ = 0. The sequence (λ′Zt−1ut ,Ft ) is a square integrable ergodic stationary mar-tingale difference, with variance

Var(λ′Zt−1ut ) = λ′E(Zt−1Z′t−1u

2t )λ = λ′E

{Zt−1Z

′t−1(η

2t − 1)2σ 4

t

= (κη − 1)λ′Bλ.

By the CLT (see Corollary A.1) we obtain that, for all λ = 0,

1√n

n∑t=1

λ′Zt−1utL→ N(0, (κη − 1)λ′Bλ).

Using the Cramer–Wold device, it follows that

1√n

n∑t=1

Zt−1utL→ N(0, (κη − 1)B). (6.7)

The conclusion follows from (6.5), (6.6) and (6.7). �

Page 147: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ESTIMATING ARCH MODELS BY LEAST SQUARES 131

Remark 6.2 (Estimation of the information matrices) Consistent estimators A and B of thematrices A and B are obtained by replacing the theoretical moments by their empirical counterparts,

A = 1

n

n∑t=1

Zt−1Z′t−1, B = 1

n

n∑t=1

σ 4t Zt−1Z

′t−1,

where σ 2t = Z′t−1θn. The fourth order moment of the process ηt = εt/σt is also consistently esti-

mated by µ4 = n−1∑nt=1(εt /σt )

4. Finally, a consistent estimator of the asymptotic variance of theOLS estimator is defined by

Varas{√n(θn − θ0)} = (µ4 − 1)A−1I A−1.

Example 6.1 (ARCH(1)) When q = 1 the moment conditions OLS2 and OLS4 take the formκηα

201 < 1 and µ8α

401 < 1 (see (2.54)). We have

A =(

1 Eε2t−1

Eε2t−1 Eε4

t−1

), B =

(Eσ 4

t Eσ 4t ε

2t−1

Eσ 4t ε

2t−1 Eσ 4

t ε4t−1

),

with

Eε2t =

ω0

1− α01, Eε4

t = κηEσ4t =

ω20(1+ α01)

(1− κηα201)(1− α01)

κη.

The other terms of the matrix B are obtained by expanding σ 4t = (ω0 + α01ε

2t−1)

2 and calculatingthe moments of order 6 and 8 of ε2

t .

Table 6.1 shows, for different laws of the iid process, that the moment conditions OLS2 andOLS4 impose strong constraints on the parameter space.

Table 6.2 displays numerical values of the asymptotic variance, for different values of α01 andω0 = 1, when ηt follows the normal N(0, 1).

The asymptotic accuracy of θn becomes very low near the boundary of the domain of existenceof Eε8

t . The OLS method can, however, be used for higher values of α01, because the estimatorremains consistent when α01 < 3−1/2 = 0.577, and thus can provide initial values for an algorithmmaximizing the likelihood.

Table 6.1 Strict stationarity and moment conditions for the ARCH(1) model when ηt followsthe N(0, 1) distribution or the Student t distribution (normalized in such a way that Eη2

t = 1).

Strict stationarity Eε2t <∞ Eε4

t <∞ Eε8t <∞

Normal α01 < 3.562 α01 < 1 α01 < 0.577 α01 < 0.312t3 α01 < 7.389 α01 < 1 no not5 α01 < 4.797 α01 < 1 α01 < 0.333 not9 α01 < 4.082 α01 < 1 α01 < 0.488 α01 < 0.143

‘no’ means that the moment condition is not satisfied.

Table 6.2 Asymptotic variance of the OLS estimator of an ARCH(1) model with ω0 = 1, whenηt ∼ N(0, 1).

α01 0.1 0.2 0.3

Varas{√n(θn − θ0)}(

3.98 −1.85−1.85 2.15

) (8.03 −5.26

−5.26 5.46

) (151.0 −106.5

−106.5 77.6

)

Page 148: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

132 GARCH MODELS

6.2 Estimation of ARCH(q) Models by FeasibleGeneralized Least Squares

In a linear regression model when, conditionally on the exogenous variables, the errors areheteroscedastic, the FGLS estimator is asymptotically more accurate than the OLS estimator. Notethat in (6.3) the errors ut are, conditionally on Zt−1, heteroscedastic with conditional varianceVar(ut | Zt−1) = (κη − 1)σ 4

t .For all θ = (ω, α1, . . . , αq)

′, let

σ 2t (θ) = ω +

q∑i=1

αiε2t−i and � = diag(σ−4

1 (θn), . . . , σ−4n (θn)).

The FGLS estimator is defined by

θn = (X′�X)−1X′�Y.

Theorem 6.3 (Asymptotic properties of the FGLS estimator) Under assumptions OLS1–OLS3 and if α0i > 0, i = 1, . . . , q,

θn → θ0, a.s.,√n(θn − θ0)

L→ N(0, (κη − 1)J−1),

where J = E(σ−4t Zt−1Z

′t−1) is positive definite.

Proof. It can be shown that J is positive definite by the argument used in Theorem 6.1.We have

θn =(

1

n

n∑t=1

σ−4t (θn)Zt−1Z

′t−1

)−1 (1

n

n∑t=1

σ−4t (θn)Zt−1ε

2t

)

=(

1

n

n∑t=1

σ−4t (θn)Zt−1Z

′t−1

)−1 {1

n

n∑t=1

σ−4t (θn)Zt−1(Z

′t−1θ0 + ut )

}

= θ0 +(

1

n

n∑t=1

σ−4t (θn)Zt−1Z

′t−1

)−1 {1

n

n∑t=1

σ−4t (θn)Zt−1ut

}. (6.8)

A Taylor expansion around θ0 yields, with σ 2t = σ 2

t (θ0),

σ−4t (θn) = σ−4

t − 2σ−6t (θ∗)

∂σ 2t

∂θ ′(θ∗)(θn − θ0), (6.9)

where θ∗ is between θn and θ0. Note that, for all θ , ∂σ 2t

∂θ(θ) = Zt−1. It follows that

1

n

n∑t=1

σ−4t (θn)Zt−1Z

′t−1 =

1

n

n∑t=1

σ−4t Zt−1Z

′t−1 −

2

n

n∑t=1

σ−6t (θ∗)Zt−1Z

′t−1 × Z′t−1(θn − θ0).

The first term on the right-hand side of the equality converges a.s. to J by the ergodic theorem.The second term converges a.s. to 0 because the OLS estimator is consistent and

Page 149: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ESTIMATING ARCH MODELS BY LEAST SQUARES 133

∥∥∥∥∥ 1

n

n∑t=1

σ−6t (θ∗)Zt−1Z

′t−1 × Z′t−1(θn − θ0)

∥∥∥∥∥ ≤(

1

n

n∑t=1

‖σ−2t (θ∗)Zt−1‖3

)‖θn − θ0‖

≤ K‖θn − θ0‖,

for n large enough. The constant bound K is obtained by arguing that the components of θn, andthus those of θ∗, are strictly positive for n large enough (because θn → θ0 a.s.). Thus, we haveσ−2t (θ∗)ε2

t−i < 1/θ∗i , for i = 1, . . . , q, and finally ‖σ−2t (θ∗)Zt−1‖ is bounded. We have shown

that a.s. (1

n

n∑t=1

σ−4t (θn)Zt−1Z

′t−1

)−1

→ J−1. (6.10)

For the term in braces in (6.8) we have

1

n

n∑t=1

σ−4t (θn)Zt−1ut

= 1

n

n∑t=1

σ−4t Zt−1ut − 2

n

n∑t=1

σ−6t (θ∗)Zt−1ut × Z′t−1(θn − θ0) (6.11)

→ 0, a.s.,

by the previous arguments, noting that E(σ−4t Zt−1ut ) = 0 and∥∥∥∥∥2

n

n∑t=1

σ−6t (θ∗)Zt−1ut × Z′t−1(θn − θ0)

∥∥∥∥∥=∥∥∥∥∥2

n

n∑t=1

σ−6t (θ∗)Zt−1σ

2t (θ0)(η

2t − 1)× Z′t−1(θn − θ0)

∥∥∥∥∥≤ K

(1

n

n∑t=1

|η2t − 1|

)∥∥θn − θ0∥∥→ 0, a.s.

Thus, we have shown that θn → θ0, a.s.Using (6.11), (6.8) and (6.10), we have

√n(θn − θ0) = (J−1 + Rn)

{1√n

n∑t=1

σ−4t Zt−1ut

}

− 2

n(J−1 + Rn)

n∑t=1

σ−6t (θ∗)Zt−1ut × Z′t−1

√n(θn − θ0),

where Rn → 0, a.s. A new expansion around θ0 gives

σ−6t (θ∗) = σ−6

t − 3σ−8t (θ∗∗)Z′t−1(θn − θ0), (6.12)

Page 150: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

134 GARCH MODELS

where θ∗∗ is between θ∗ and θ0. It follows that

√n(θn − θ0)

= (J−1 + Rn)

{1√n

n∑t=1

σ−2t Zt−1(η

2t − 1)

}

− 2

n(J−1 + Rn)

n∑t=1

σ−4t Zt−1(η

2t − 1)× Z′t−1

√n(θn − θ0)

+ 6

n3/2(J−1 + Rn)

n∑t=1

σ−8t (θ∗∗)Zt−1ut × {Z′t−1

√n(θn − θ0)} × {Z′t−1

√n(θ∗ − θ0)}

:= Sn1 + Sn2 + Sn3. (6.13)

The CLT applied to the ergodic and square integrable stationary martingale differenceσ−2t Zt−1(η

2t − 1) shows that Sn1 converges in distribution to a Gaussian vector with zero mean

and varianceJ−1E{σ−4

t (η2t − 1)2Zt−1Z

′t−1}J−1 = (κη − 1)J−1

(see Corollary A.1). Moreover,

1

n

n∑t=1

σ−4t Zt−1(η

2t − 1)× Z′t−1

√n(θn − θ0)

={

1

n

n∑t=1

σ−4t Zt−1(η

2t − 1)

}√n(ωn − ω0)

+q∑

j=1

{1

n

n∑t=1

σ−4t Zt−1(η

2t − 1)ε2

t−j

}√n(αnj − α0j ).

The two terms in braces tend to 0 a.s. by the ergodic theorem. Moreover, the terms√n(ωn − ω0)

and√n(αnj − α0j ) are bounded in probability, as well as J−1 + Rn. It follows that Sn2 tends to

0 in probability. Finally, by arguments already used and because θ∗ is between θn and θ0,

‖Sn3‖ ≤ K

n1/2‖J−1 + Rn‖‖

√n(θn − θ0)‖2 1

n

n∑t=1

|η2t − 1| → 0,

in probability. Using (6.12), we have shown the convergence in law of the theorem. �

The moment condition required for the asymptotic normality of the FGLS estimator isE(ε4

t ) <∞. For the OLS estimator we had the more restrictive condition Eε8t <∞. Moreover,

when this eighth-order moment exists, the following result shows that the OLS estimator isasymptotically less accurate than the FGLS estimator.

Theorem 6.4 (Asymptotic OLS versus FGLS variances) Under assumptions OLS1–OLS4,the matrix

A−1BA−1 − J−1

is positive semi-definite.

Page 151: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ESTIMATING ARCH MODELS BY LEAST SQUARES 135

Proof. Let D = σ 2t A

−1Zt−1 − σ−2t J−1Zt−1. Then

E(DD′) = A−1E(σ 4t Zt−1Z

′t−1)A

−1 + J−1E(σ−4t Zt−1Z

′t−1)J

−1

−A−1E(Zt−1Z′t−1)J

−1 − J−1E(Zt−1Z′t−1)A

−1

= A−1BA−1 − J−1

is positive semi-definite, and the result follows. �

We will see in Chapter 7 that the asymptotic variance of the FGLS estimator coincides with thatof the quasi-maximum likelihood estimator (but the asymptotic normality of the latter is obtainedwithout moment conditions). This result explains why quasi-maximum likelihood is preferred toOLS (and even to FGLS) for the estimation of ARCH (and GARCH) models. Note, however, thatthe OLS estimator often provides a good initial value for the optimization algorithm required forthe quasi-maximum likelihood method.

6.3 Estimation by Constrained Ordinary Least Squares

Negative components are not precluded in the OLS estimator θn defined by (6.4) (see Exercise 6.3).When the estimate has negative components, predictions of the volatility can be negative. In orderto avoid this problem, we consider the constrained OLS estimator defined by

θ cn = arg minθ∈[0,∞)q+1

Qn(θ), Qn(θ) = 1

n‖Y −Xθ‖2.

The existence of θ cn is guaranteed by the continuity of the function Qn and the fact that

{nQn(θ)}1/2 ≥ ‖Xθ‖ − ‖Y‖ → ∞

as ‖θ‖ → ∞ and θ ≥ 0, whenever X has nonzero columns. Note that the latter condition is satisfiedat least for n large enough (see Exercise 6.5).

6.3.1 Properties of the Constrained OLS EstimatorThe following theorem gives a condition for equality between the constrained and unconstrainedestimators. The theorem is stated in the ARCH case but is true in a much more general framework.

Theorem 6.5 (Equality between constrained and unconstrained OLS) If X is of rank q + 1,the constrained and unconstrained estimators coincide, θ cn = θn, if and only if θn ∈ [0,+∞)q+1.

Proof. Since θn and θ cn are obtained by minimizing the same function Qn(·), and since θ cnminimizes this function on a smaller set, we have Qn(θn) ≤ Qn(θ

cn). Moreover, θ cn ∈ [0,+∞)q+1,

and we have Qn(θ) ≥ Qn(θcn), for all θ ∈ [0,+∞)q+1.

Suppose that the unconstrained estimation θn belongs to [0,+∞)q+1. In this case Qn(θn) =Qn(θ

cn). Because the unconstrained solution is unique, θ cn = θn.

The converse is trivial. �

We now give a way to obtain the constrained estimator from the unconstrained estimator.

Page 152: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

136 GARCH MODELS

Theorem 6.6 (Constrained OLS as a projection of OLS) If X has rank q + 1, the constrainedestimator θ cn is the orthogonal projection of θn on [0,+∞)q+1 with respect to the metricX′X, that is,

θ cn = arg minθ∈[0,+∞)q+1

(θn − θ)′X′X(θn − θ). (6.14)

Proof. If we denote by P the orthogonal projector on the columns of X, and M = In − P ,we have

nQ(θ) = ‖Y −Xθ‖2 = ‖P(Y −Xθ)‖2 + ‖M(Y −Xθ)‖2

= ‖X(θn − θ)‖2 + ‖MY‖2,

using properties of projections, Pythagoras’s theorem and PY = Xθn. The constrained estimationθ cn thus solves (6.14). Note that, since X has full column rank, a norm is well defined by‖x‖X′X =

√x ′X′Xx. The characterization (6.14) is equivalent to

θcn ∈ [0,+∞)q+1, ‖θn − θ cn‖X′X ≤ ‖θn − θ‖X′X, ∀θ ∈ [0,+∞)q+1. (6.15)

Since [0,+∞)q+1 is convex, θ cn exists, is unique and is the X′X-orthogonal projection of θn on[0,+∞)q+1. This projection is characterized by

θ cn ∈ [0,+∞)q+1 and⟨θn − θ cn, θ

cn − θ

⟩X′X ≥ 0, ∀θ ∈ [0,+∞)q+1 (6.16)

(see Exercise 6.9). This characterization shows that, when θn /∈ [0,+∞)q+1, the constrained esti-mation θ cn must lie at the boundary of [0,+∞)q+1. Otherwise it suffices to take θ ∈ [0,+∞)q+1

between θ cn and θn to obtain a scalar product equal to −1. �

The characterization (6.15) allows us to easily obtain the strong consistency of the constrainedestimator.

Theorem 6.7 (Consistency of the constrained OLS estimator) Under the assumptions ofTheorem 6.1, almost surely,

θ cn → θ0 as n→∞.

Proof. Since θ0 ∈ [0,+∞)q+1, in view of (6.15) we have

‖θn − θ cn‖X′X/n ≤ ‖θn − θ0‖X′X/n.

It follows that, using the triangle inequality,

‖θ cn − θ0‖X′X/n ≤ ‖θ cn − θn‖X′X/n + ‖θn − θ0‖X′X/n ≤ 2‖θn − θ0‖X′X/n.

Since, in view of Theorem 6.1, θn → θ0 a.s. and X′X/n converges a.s. to a positive def-inite matrix, it follows that ‖θn − θ0‖X′X/n → 0 and thus that ‖θn − θ cn‖X′X/n → 0 a.s. UsingExercise 6.12, the conclusion follows. �

Page 153: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ESTIMATING ARCH MODELS BY LEAST SQUARES 137

6.3.2 Computation of the Constrained OLS EstimatorWe now give an explicit way to obtain the constrained estimator. We have already seen that ifall the components of the unconstrained estimator θn are positive, we have θ cn = θn. Now supposethat one component of θn is negative, for instance the last one. Let

X = (X(1), X(2)), X(1) =

1 ε2

0 . . . ε2−q+2

1 ε21 . . . ε2

−q+3...

1 ε2n−1 . . . ε2

n−q+1

,

and

θn =(X′X

)−1X′Y =

(θ(1)n

αq

), θn =

(θ(1)n

0

)=( (

X(1)′X(1))−1

X(1)′Y

0

).

Note that θ (1)n = θ(1)n in general (see Exercise 6.11).

Theorem 6.8 (Explicit form of the constrained estimator) Assume that X has rank q + 1 andαq < 0. Then

θ (1)n ∈ [0,+∞)q ⇐⇒ θ cn = θn.

Proof. Let P (1) = X(1)(X(1)′X(1)

)−1X(1)′ be the projector on the columns of X(1) and let M(1) =

I − P (1). We have

θ ′nX′ =

(Y ′X(1)

(X(1)′X(1)

)−1, 0

)(X(1)′

X(2)′

)= Y ′P (1),

θ ′nX′X = (Y ′X(1), Y ′P (1)X(2)) ,

θ ′nX′X = Y ′X = (Y ′X(1), Y ′X(2)) ,(

θ ′n − θ ′n)X′X = (0, Y ′M(1)X(2)) .

Because θ ′neq+1 < 0, with eq+1 = (0, . . . , 0, 1)′, we have (θ ′n − θ ′n)eq+1 < 0. This can bewritten as (

θ ′n − θ ′n)X′X

(X′X

)−1eq+1 < 0,

or alternativelyY ′M(1)X(2){(X′X)−1}q+1,q+1 < 0.

Thus Y ′M(1)X(2) < 0. It follows that for all θ = (θ(1)′, θ (2))′ such that θ(2) ∈ [0,∞),⟨

θn − θn, θn − θ⟩X′X =

(θn − θn

)′X′X

(θn − θ

)= (0, Y ′M(1)X(2)) ( θ

(1)n − θ (1)

−θ(2))

= −θ(2)Y ′M(1)X(2) ≥ 0.

In view of (6.16), we have θ cn = θn because θn ∈ [0,+∞)q+1. �

Page 154: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

138 GARCH MODELS

6.4 Bibliographical Notes

The OLS method was proposed by Engle (1982) for ARCH models. The asymptotic properties ofthe OLS estimator were established by Weiss (1984, 1986), in the ARMA-GARCH framework,under eighth-order moments assumptions. Pantula (1989) also studied the asymptotic properties ofthe OLS method in the AR(1)-ARCH(q) case, and he gave an explicit form for the asymptoticvariance. The FGLS method was developed, in the ARCH case, by Bose and Mukherjee (2003)(see also Gourieroux, 1997). The convexity results used for the study of the constrained estimatorcan be found, for instance, in Moulin and Fogelman-Soulie (1979).

6.5 Exercises

6.1 (Estimating the ARCH(q) for q = 1, 2, . . . )Describe how to use the Durbin algorithm (B.7)–(B.9) to estimate an ARCH(q) modelby OLS.

6.2 (Explicit expression for the OLS estimator of an ARCH process)With the notation of Section 6.1, show that, when X has rank q, the estimatorθ = (X′X)−1X′Y is the unique solution of the minimization problem

θ = arg minθ∈Rq+1

n∑t=1

(ε2t − Z′t−1θ)

2.

6.3 (OLS estimator with negative values)Give a numerical example (with, for instance, n = 2) showing that the unconstrained OLSestimator of the ARCH(q) parameters (with, for instance, q = 1) can take negative values.

6.4 (Unconstrained and constrained OLS estimator of an ARCH(2) process)Consider the ARCH(2) model{

εt = σtηtσ 2t = ω + α1ε

2t−1 + α2ε

2t−2.

Let θ = (ω, α1, α2)′ be the unconstrained OLS estimator of θ = (ω, α1, α2)

′. Is it possibleto have

1. α1 < 0?

2. α1 < 0 and α2 < 0?

3. ω < 0, α1 < 0 and α2 < 0?

Let θ c = (ωc, αc1, αc2)′ be the OLS constrained estimator with αc1 ≥ 0 and αc2 ≥ 0. Consider

the following numerical example with n = 3 observations and two initial values: ε2−1 = 0,

ε20 = 1, ε2

1 = 0, ε22 = 1/2, ε2

3 = 1/2. Compute θ and θ c for these observations.

6.5 (The columns of the matrix X are nonzero)Show that if ω0 > 0, the matrix X cannot have a column equal to zero for n large enough.

6.6 (Estimating an AR(1) with ARCH(q) errors)Consider the model

Xt = φ0Xt−1 + εt , |φ0| < 1,

Page 155: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ESTIMATING ARCH MODELS BY LEAST SQUARES 139

where (εt ) is the strictly stationary solution of model (6.1) under the condition Eε4t <∞.

Show that the OLS estimator of φ is consistent and asymptotically normal. Is the assumptionEε4

t <∞ necessary in the case of iid errors?

6.7 (Inversion of a block matrix)

For a matrix partitioned as A =[A11 A12

A21 A22

], show that the inverse (when it exists) is of

the form

A−1 = A−1

11 + A−111 A12FA21A

−111 −A−1

11 A12F

−FA21A−111 F

,where F = (A22 − A21A

−111 A12)

−1.

6.8 (Does the OLS asymptotic variance depend on ω0?)

1. Show that for an ARCH(q) model E(ε2mt ) is proportional to ωm0 (when it exists).

2. Using Exercise 6.7, show that, for an ARCH(q) model, the asymptotic variance of theOLS estimator of the α0i does not depend on ω0.

3. Show that the asymptotic variance of the OLS estimator of ω0 is proportional to ω20.

6.9 (Properties of the projections on closed convex sets)Let E be an Hilbert space, with a scalar product 〈·, ·〉 and a norm ‖ · ‖. When C ⊂ E andx ∈ E, it is said that x∗ ∈ C is a best approximation of x on C if ‖x − x∗‖ = miny∈C ‖x − y‖.1. Show that if C is closed and convex, x∗ exists and is unique. This point is then called the

projection of x on C.

2. Show that x∗ satisfies the so-called variational inequalities:

∀y ∈ C, 〈x∗ − x, x∗ − y〉 ≤ 0. (6.17)

and prove that x∗ is the unique point of C satisfying these inequalities.

6.10 (Properties of the projections on closed convex cones)Recall that a subset K of the vectorial space E is a cone if, for all x ∈ K , and for all λ ≥ 0,we have λx ∈ K . Let K be a closed convex cone of the Hilbert space E.

1. Show that the projection x∗ of x on K (see Exercise 6.9) is characterized by{ 〈x − x∗, x∗〉 = 0,〈x − x∗, y〉 ≤ 0, ∀y ∈ K. (6.18)

2. Show that x∗ satisfies

(a) ∀x ∈ E,∀λ ≥ 0, (λx)∗ = λx∗.

(b) ∀x ∈ E, ‖x‖2 = ‖x∗‖2 + ‖x − x∗‖2, thus ‖x∗‖ ≤ ‖x‖.6.11 (OLS estimation of a subvector of parameters)

Consider the linear model Y = Xθ + U with the usual assumptions. Let M2 be the matrixof the orthogonal projection on the orthogonal subspace of X(2), where X = (X(1), X(2)).Show that the OLS estimator of θ(1) (where θ = (θ (1)

′, θ (2)

′)′, with obvious notation) is

θ(1)n =

(X(1)′M2X

(1))−1

X(1)′M2Y .

Page 156: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

140 GARCH MODELS

6.12 (A matrix result used in the proof of Theorem 6.7)Let (Jn) be a sequence of symmetric k × k matrices converging to a positive definite matrixJ . Let (Xn) be a sequence of vectors in Rk such that X′nJnXn → 0. Show that Xn → 0.

6.13 (Example of constrained estimator calculus)Take the example of Exercise 6.3 and compute the constrained estimator.

Page 157: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

7

Estimating GARCH Modelsby Quasi-Maximum Likelihood

The quasi-maximum likelihood (QML) method is particularly relevant for GARCH models becauseit provides consistent and asymptotically normal estimators for strictly stationary GARCH pro-cesses under mild regularity conditions, but with no moment assumptions on the observed process.By contrast, the least-squares methods of the previous chapter require moments of order 4 at least.

In this chapter, we study in details the conditional QML method (conditional on initial values).We first consider the case when the observed process is pure GARCH. We present an iterativeprocedure for computing the Gaussian log-likelihood, conditionally on fixed or random initialvalues. The likelihood is written as if the law of the variables ηt were Gaussian N(0, 1) (we referto pseudo- or quasi-likelihood), but this assumption is not necessary for the strong consistencyof the estimator. In the second part of the chapter, we will study the application of the methodto the estimation of ARMA-GARCH models. The asymptotic properties of the quasi-maximumlikelihood estimator (QMLE) are established at the end of the chapter.

7.1 Conditional Quasi-Likelihood

Assume that the observations ε1, . . . , εn constitute a realization (of length n) of a GARCH(p, q)process, more precisely a nonanticipative strictly stationary solution of

εt =√htηt

ht = ω0 +q∑i=1

α0iε2t−i +

p∑j=1

β0j ht−j , ∀t ∈ Z,(7.1)

where (ηt ) is a sequence of iid variables of variance 1, ω0 > 0, α0i ≥ 0 (i = 1, . . . , q), and β0j ≥ 0(j = 1, . . . , p). The orders p and q are assumed known. The vector of the parameters

θ = (θ1, . . . , θp+q+1)′ := (ω, α1, . . . , αq, β1, . . . , βp)

′ (7.2)

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 158: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

142 GARCH MODELS

belongs to a parameter space of the form

� ⊂ (0,+∞)× [0,∞)p+q . (7.3)

The true value of the parameter is unknown, and is denoted by

θ0 = (ω0, α01, . . . , α0q, β01, . . . , β0p)′.

To write the likelihood of the model, a distribution must be specified for the iid variables ηt .Here we do not make any assumption on the distribution of these variables, but we work witha function, called the (Gaussian) quasi-likelihood, which, conditionally on some initial values,coincides with the likelihood when the ηt are distributed as standard Gaussian. Given initial val-ues ε0, . . . , ε1−q, σ 2

0 , . . . , σ21−p to be specified below, the conditional Gaussian quasi-likelihood is

given by

Ln(θ) = Ln(θ; ε1, . . . , εn) =n∏t=1

1√2πσ 2

t

exp

(− ε2

t

2σ 2t

),

where the σ 2t are recursively defined, for t ≥ 1, by

σ 2t = σ 2

t (θ) = ω +q∑i=1

αiε2t−i +

p∑j=1

βj σ2t−j . (7.4)

For a given value of θ , under the second-order stationarity assumption, the unconditional variance(corresponding to this value of θ ) is a reasonable choice for the unknown initial values:

ε20 = · · · = ε2

1−q = σ 20 = · · · = σ 2

1−p =ω

1−∑q

i=1 αi −∑p

j=1 βj. (7.5)

Such initial values are, however, not suitable for IGARCH models, in particular, and more generallywhen the second-order stationarity is not imposed. Indeed, the constant (7.5) would then takenegative values for some values of θ . In such a case, suitable initial values are

ε20 = . . . = ε2

1−q = σ 20 = . . . = σ 2

1−p = ω (7.6)

or

ε20 = . . . = ε2

1−q = σ 20 = . . . = σ 2

1−p = ε21 . (7.7)

A QMLE of θ is defined as any measurable solution θn of

θn = arg maxθ∈�

Ln(θ).

Taking the logarithm, it is seen that maximizing the likelihood is equivalent to minimizing, withrespect to θ,

ln(θ) = n−1n∑t=1

t , where t = t (θ) = ε2t

σ 2t

+ log σ 2t (7.8)

Page 159: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 143

and σ 2t is defined by (7.4). A QMLE is thus a measurable solution of the equation

θn = arg minθ∈�

ln(θ). (7.9)

It will be shown that the choice of the initial values is unimportant for the asymptotic propertiesof the QMLE. However, in practice this choice may be important. Note that other methods arepossible for generating the sequence σ 2

t ; for example, by taking σ 2t = c0(θ)+

∑t−1i=1 ci(θ)ε

2t−i ,

where the ci(θ) are recursively computed (see Berkes, Horvath and Kokoszka, 2003b). Note thatfor computing ln(θ), this procedure involves a number of operations of order n2, whereas theone we propose involves a number of order n. It will be convenient to approximate the sequence( t (θ)) by an ergodic stationary sequence. Assuming that the roots of Bθ (z) are outside the unitdisk, the nonanticipative and ergodic strictly stationary sequence

(σ 2t

)t= {σ 2

t (θ)}t

is defined asthe solution of

σ 2t = ω +

q∑i=1

αiε2t−i +

p∑j=1

βjσ2t−j , ∀t. (7.10)

Note that σ 2t (θ0) = ht .

Likelihood Equations

Likelihood equations are obtained by canceling the derivative of the criterion ln(θ) with respectto θ , which gives

1

n

n∑t=1

{ε2t − σ 2

t }1

σ 4t

∂σ 2t

∂θ= 0. (7.11)

These equations can be interpreted as orthogonality relations, for large n. Indeed, as will be seenin the next section, the left-hand side of equation (7.11) has the same asymptotic behavior as

1

n

n∑t=1

{ε2t − σ 2

t }1

σ 4t

∂σ 2t

∂θ,

the impact of the initial values vanishing as n→∞.The innovation of ε2

t is νt = ε2t − h2

t . Thus, under the assumption that the expectation exists,we have

Eθ0

(νt

1

σ 4t (θ0)

∂σ 2t (θ0)

∂θ

)= 0,

because 1σ4t (θ0)

∂σ2t (θ0)

∂θis a measurable function of the εt−i , i > 0. This result can be viewed as the

asymptotic version of (7.11) at θ0, using the ergodic theorem.

7.1.1 Asymptotic Properties of the QMLEIn this chapter, we will use the matrix norm defined by ‖A‖ =∑ |aij | for all matrices A = (aij ).The spectral radius of a square matrix A is denoted by ρ(A).

Page 160: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

144 GARCH MODELS

Strong Consistency

Recall that model (7.1) admits a strictly stationary solution if and only if the sequence of matricesA0 = (A0t ), where

A0t =

α01η2t · · · α0qη

2t β01η

2t · · · β0pη

2t

1 0 · · · 0 0 · · · 00 1 · · · 0 0 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 1 0 0 . . . 0 0

α01 · · · α0q β01 · · · β0p

0 · · · 0 1 0 · · · 00 · · · 0 0 1 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 0 0 0 . . . 1 0

,

admits a strictly negative top Lyapunov exponent, γ (A0) < 0, where

γ (A0) := inft∈N∗

1

tE(log ‖A0tA0t−1 . . . A01‖)

= limt→∞ a.s.

1

tlog ‖A0tA0t−1 . . . A01‖. (7.12)

Let

Aθ (z) =q∑i=1

αizi and Bθ (z) = 1−

p∑j=1

βj zj .

By convention, Aθ (z) = 0 if q = 0 and Bθ (z) = 1 if p = 0. To show strong consistency, thefollowing assumptions are used.

A1: θ0 ∈ � and � is compact.

A2: γ (A0) < 0 and for all θ ∈ �,∑p

j=1 βj < 1.

A3: η2t has a nondegenerate distribution and Eη2

t = 1.

A4: If p> 0, Aθ0(z) and Bθ0(z) have no common roots, Aθ0 (1) = 0, and α0q + β0p = 0.

Note that, by Corollary 2.2, the second part of assumption A2 implies that the roots of Bθ (z)are outside the unit disk. Thus, a nonanticipative and ergodic strictly stationary sequence

(σ 2t

)t

isdefined by (7.10). Similarly, define

ln(θ) = ln(θ; εn, εn−1 . . . , ) = n−1n∑t=1

t , t = t (θ) = ε2t

σ 2t

+ log σ 2t .

Example 7.1 (Parameter space of a GARCH(1, 1) process) In the case of a GARCH(1, 1)process, assumptions A1 and A2 hold true when, for instance, the parameter space is of the form

� = [δ, 1/δ]× [0, 1/δ]× [0, 1− δ],

where δ ∈ (0, 1) is a constant, small enough so that the true value θ0 = (ω0, α0, β0)′ belongs

to �. Figure 7.1 displays, in the plane (α, β), the zones of strict stationarity (when ηt is N(0, 1)

Page 161: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 145

1

1 2eg = 3.56

b

q0

b

aa

Figure 7.1 GARCH(1, 1): zones of strict and second-order stationarity and parameter space� = [ω,ω]× [0, α]× [0, β].

distributed) and of second-order stationarity, as well as an example of a parameter space � (thegray zone) compatible with assumptions A1 and A2.

The first result states the strong consistency of θn. The proof of this theorem, and of the next ones,is given in Section 7.4.

Theorem 7.1 (Strong consistency of the QMLE) Let (θn) be a sequence of QMLEs satisfying(7.9), with initial conditions (7.6) or (7.7). Under assumptions A1–A4, almost surely

θn → θ0, as n→∞.

Remark 7.1

1. It is not assumed that the true value of the parameter θ0 belongs to the interior of �. Thus,the theorem allows to handle cases where some coefficients, αi or βj , are null.

2. It is important to note that the strict stationarity condition is only assumed at θ0, notover all �. In view of Corollary 2.2, the condition

∑p

j=1 βj < 1 is weaker than the strictstationarity condition.

3. Assumption A4 disappears in the ARCH case. In the general case, this assumption allowsfor an overidentification of either of the two orders, p or q, but not of both. We thenconsistently estimate the parameters of a GARCH(p − 1, q) (or GARCH(p, q − 1)) processif an overparameterized GARCH(p, q) model is used.

4. When p = 0, assumption A4 precludes the case where all the α0i are zero. In such a case,the strictly stationary solution of the model is the strong white noise, which can be writtenin multiple forms. For instance, a strong white noise of variance 1 can be written in theGARCH(1, 1) form with σ 2

t = σ 2(1− β)+ 0× ε2t−1 + βσ 2

t−1.

5. The assumption of absence of a common root, in A4, is restrictive only if p> 1 andq > 1. Indeed if q = 1, the unique root of Aθ0(z) is 0 and we have Bθ0 (0) = 0. If p =1 and β01 = 0, the unique root of Bθ0(z) is 1/β01 > 0 (if β01 = 0, the polynomial doesnot admit any root). Because the coefficients α0i are positive this value cannot be a zeroof Aθ0(z).

6. The assumption Eηt = 0 is not required for the consistency (and asymptotic normality)of the QMLE of a GARCH. The conditional variance of εt is thus, in general, only pro-portional to ht : Var(εt | εu, u < t) = {1− (Eηt )

2}ht . The assumption Eη2t = 1 is made for

identifiability reasons and is not restrictive provided that Eη2t <∞.

Page 162: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

146 GARCH MODELS

Asymptotic Normality

The following additional assumptions are considered.

A5: θ0 ∈◦�, where

◦� denotes the interior of �.

A6: κη = Eη4t <∞.

The limiting distribution of θn is given by the following result.

Theorem 7.2 (Asymptotic normality of the QMLE) Under assumptions A1-A6,

√n(θn − θ0)

L→ N(0, (κη − 1)J−1),

where

J := Eθ0

(∂2 t (θ0)

∂θ∂θ ′

)= Eθ0

(1

σ 4t (θ0)

∂σ 2t (θ0)

∂θ

∂σ 2t (θ0)

∂θ ′

)(7.13)

is a positive definite matrix.

Remark 7.2

1. Assumption A5 is standard and entails the first-order condition (at least asymptotically).Indeed if θn is consistent, it also belongs to the interior of �, for large n. At this maximumthe derivative of the objective function cancels. However, assumption A5 is restrictivebecause it precludes, for instance, the case α01 = 0.

2. When one or several components of θ0 are null, assumption A5 is not satisfied andthe theorem cannot be used. It is clear that, in this case, the asymptotic distribution of√n(θn − θ0) cannot be normal because the estimator is constrained. If, for instance,

α01 = 0, the distribution of√n(α1 − α01) is concentrated in [0,∞), for all n, and thus

cannot be asymptotically normal. This kind of ‘boundary’ problem is the object of aspecific study in Chapter 8.

3. Assumption A6 does not concern ε2t , and does not preclude the IGARCH case. Only a

fourth-order moment assumption on ηt is required. This assumption is clearly necessary forthe existence of the variance of the score vector ∂ t (θ0)/∂θ . In the proof of this theorem,it is shown that

Eθ0

{∂ t (θ0)

∂θ

}= 0, Varθ0

{∂ t (θ0)

∂θ

}= (κη − 1)J.

4. In the ARCH case (p = 0), the asymptotic variance of the QMLE reduces to that ofthe FGLS estimator (see Theorem 6.3). Indeed, in this case we have ∂σ 2

t (θ)/∂θ = Zt−1.

Theorem 6.3 requires, however, the existence of a fourth-order moment for the observed pro-cess, whereas there is no moment assumption for the asymptotic normality of the QMLE.Moreover, Theorem 6.4 shows that the QMLE of an ARCH(q) is asymptotically moreaccurate than that of the OLS estimator.

Page 163: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 147

7.1.2 The ARCH(1) Case: Numerical Evaluationof the Asymptotic Variance

Consider the ARCH(1) modelεt = {ω0 + α0ε

2t−1}1/2ηt ,

with ω0 > 0 and α0 > 0, and suppose that the variables ηt satisfy assumption A3. The parameteris θ = (ω, α)′. In view of (2.10), the strict stationarity constraint A2 is written as

α0 < exp{−E(log η2t )}.

Assumption A1 holds true if, for instance, the parameter space is of the form � = [δ, 1/δ]×[0, 1/δ], where δ > 0 is a constant, chosen sufficiently small so that θ0 = (ω0, α0)

′ belongs to �.By Theorem 7.1, the QMLE of θ is then strongly consistent. Since ∂σ 2

t /∂θ = (1, ε2t−1)

′, the QMLEθn = (ωn, αn)

′ is characterized by the normal equation

1

n

n∑t=1

ε2t − ωn − αnε

2t−1

(ωn + αnε2t−1)

2

(1

ε2t−1

)= 0

with, for instance, ε20 = ε2

1 . This estimator does not have an explicit form and must be obtainednumerically. Theorem 7.2, which provides the asymptotic distribution of the estimator, only requires

the extra assumption that θ0 belongs to◦� = (δ, 1/δ)× (0, 1/δ). Thus, if α0 = 0 (that is, if the

model is conditionally homoscedastic), the estimator remains consistent but is no longer asymp-totically normal. Matrix J takes the form

J = Eθ0

1(ω0+α0ε

2t−1)

2

ε2t−1

(ω0+α0ε2t−1)

2

ε2t−1

(ω0+α0ε2t−1)

2

ε4t−1

(ω0+α0ε2t−1)

2

,and the asymptotic variance of

√n(θn − θ0) is

Varas{√n(θn − θ0)} = (κη − 1)J−1.

Table 7.1 displays numerical evaluations of this matrix. An estimation of J is obtained by replacingthe expectations by empirical means, obtained from simulations of length 10 000, when ηt is N(0, 1)distributed. This experiment is repeated 1000 times to obtain the results presented in the table.

In order to assess, in finite samples, the quality of the asymptotic approximation of the varianceof the estimator, the following Monte Carlo experiment is conducted. For the value θ0 of theparameter, and for a given length n, N samples are simulated, leading to N estimations θ (i)n of

Table 7.1 Asymptotic variance for the QMLE of an ARCH(1) process with ηt ∼ N(0, 1).

ω0 = 1, α0 = 0.1 ω0 = 1, α0 = 0.5 ω0 = 1, α0 = 0.95

Varas{√n(θn − θ0)}(

3.46 −1.34−1.34 1.87

) (4.85 −2.15

−2.15 3.99

) (6.61 −2.83−2.83 6.67

)

Page 164: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

148 GARCH MODELS

Table 7.2 Comparison of the empirical and theoretical asymptotic variances, for the QMLE ofthe parameter α0 = 0.9 of an ARCH(1), when ηt ∼ N(0, 1).

n αn RMSE(α){Varas[

√n(αn − α0)]

}1/2 /√n P [αn ≥ 1]

100 0.85221 0.25742 0.25014 0.266250 0.88336 0.16355 0.15820 0.239500 0.89266 0.10659 0.11186 0.152

1000 0.89804 0.08143 0.07911 0.100

θ0, i = 1, . . . N . We denote by θn = (ωn, αn)′ their empirical mean. The root mean squared error

(RMSE) of estimation of α is denoted by

RMSE(α) ={

1

N

N∑i=1

(α(i)n − αn

)2}1/2

and can be compared to{Varas [

√n(αn − α0)]

}1/2 /√n, the latter quantity being evaluated inde-

pendently, by simulation. A similar comparison can obviously be made for the parameter ω. Forθ0 = (0.2, 0.9)′ and N = 1000, Table 7.2 displays the results, for different sample length n.

The similarity between columns 3 and 4 is quite satisfactory, even for moderate sample sizes.The last column gives the empirical probability (that is, the relative frequency within the N

samples) that αn is greater than 1 (which is the limiting value for second-order stationarity). Theseresults show that, even if the mean of the estimations is close to the true value for large n, thevariability of the estimator remains high. Finally, note that the length n = 1000 remains realisticfor financial series.

7.1.3 The Nonstationary ARCH(1)

When the strict stationarity constraint is not satisfied in the ARCH(1) case, that is, when

α0 ≥ exp{−E log η2

t

}, (7.14)

one can define an ARCH(1) process starting with initial values. For a given value ε0, we define

εt = h1/2t ηt , ht = ω0 + α0ε

2t−1, t = 1, 2, . . . , (7.15)

where ω0 > 0 and α0 > 0, with the usual assumptions on the sequence (ηt ). As already noted,σ 2t converges to infinity almost surely when

α0 > exp{−E log η2

t

}, (7.16)

and only in probability when the inequality (7.14) is an equality (see Corollary 2.1 and Remark 2.3following it). Is it possible to estimate the coefficients of such a model? The answer is only partlypositive: it is possible to consistently estimate the coefficient α0, but the coefficient ω0 cannot beconsistently estimated. The practical impact of this result thus appears to be limited, but becauseof its theoretical interest, the problem of estimating coefficients of nonstationary models deservesattention. Consider the QMLE of an ARCH(1), that is to say a measurable solution of

(ωn, αn) = arg minθ∈�

1

n

n∑t=1

t (θ), t (θ) = ε2t

σ 2t (θ)

+ log σ 2t (θ), (7.17)

Page 165: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 149

where θ = (ω, α), � is a compact set of (0,∞)2, and σ 2t (θ) = ω + αε2

t−1 for t = 1, . . . , n (startingwith a given initial value for ε2

0 ). The almost sure convergence of ε2t to infinity will be used to

show the strong consistency of the QMLE of α0. The following lemma completes Corollary 2.1and gives the rate of convergence of ε2

t to infinity under (7.16).

Lemma 7.1 Define the ARCH(1) model by (7.15) with any initial condition ε20 ≥ 0. The nonsta-

tionarity condition (7.16) is assumed. Then, almost surely, as n→∞,

1

hn= o(ρn) and

1

ε2n

= o(ρn)

for any constant ρ such that

1>ρ > exp{−E log η2

t

}/α0. (7.18)

This result entails the strong consistency and asymptotic normality of the QMLE of α0.

Theorem 7.3 Consider the assumptions of Lemma 7.1 and the QMLE defined by (7.17) whereθ0 = (ω0, α0) ∈ �. Then

αn → α0 a.s., (7.19)

and when θ0 belongs to the interior of �,

√n (αn − α0)

L→ N {0, (κη − 1)α20

}(7.20)

as n→∞.

In the proof of this theorem, it is shown that the score vector satisfies

1√n

n∑t=1

∂θ t (θ0)

L→ N{

0, J = (κη − 1)

(0 00 α−2

0

)}.

In the standard statistical inference framework, the variance J of the score vector is (propor-tional to) the Fisher information. According to the usual interpretation, the form of the matrixJ shows that, asymptotically and for almost all observations, the variations of the log-likelihoodn−1/2∑n

t=1 log t (θ) are insignificant when θ varies from (ω0, α0) to (ω0 + h, α0) for small h.In other words, the limiting log-likelihood is flat at the point (ω0, α0) in the direction of variationof ω0. Thus, minimizing this limiting function does not allow θ0 to be found. This leads us tothink that the QML of ω0 is likely to be inconsistent when the strict stationarity condition is notsatisfied. Figure 7.2 displays numerical results illustrating the performance of the QMLE in finitesamples. For different values of the parameters, 100 replications of the ARCH(1) model have beengenerated, for the sample sizes n = 200 and n = 4000. The top panels of the figure correspondto a second-order stationary ARCH(1), with parameter θ0 = (1, 0.95). The panels in the middlecorrespond to a strictly stationary ARCH(1) of infinite variance, with θ0 = (1, 1.5). The resultsobtained for these two cases are similar, confirming that second-order stationarity is not neces-sary for estimating an ARCH. The bottom panels, corresponding to the explosive ARCH(1) withparameter θ0 = (1, 4), confirm the asymptotic results concerning the estimation of α0. They alsoillustrate the failure of the QML to estimate ω0 under the nonstationarity assumption (7.16). Theresults even deteriorate when the sample size increases.

Page 166: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

150 GARCH MODELS

w − w∧

∧ ∧ ∧ ∧

∧ ∧ ∧

∧ ∧ ∧

−0.5

0.5

−0.2

0.1

estimation errors when w = 1, a = 0.95

Second-order stationarity, n = 200 Second-order stationarity, n = 4000

Strict stationarity, n = 200 Strict stationarity, n = 4000

Nonstationarity, n = 200 Nonstationarity, n = 4000

a − a w − westimation errors when w = 1, a = 0.95

a − a

w − w

−0.5

0.5

−0.2

−0.2

0.1

0.1

estimation errors when w = 1, a = 1.5

a − a w − westimation errors when w = 1, a = 1.5

a − a

w − w

−0.5

0.5

estimation errors when w = 1, a = 4

a − a w − westimation errors when w = 1, a = 4

a − a

Figure 7.2 Box-plots of the QML estimation errors for the parameters ω0 and α0 of an ARCH(1)process, with ηt ∼ N(0, 1).

7.2 Estimation of ARMA-GARCH Modelsby Quasi-Maximum Likelihood

In this section, the previous results are extended to cover the situation where the GARCH process isnot directly observed, but constitutes the innovation of an observed ARMA process. This frameworkis relevant because, even for financial series, it is restrictive to assume that the observed series isthe realization of a noise. From a theoretical point of view, it will be seen that the extension to theARMA-GARCH case is far from trivial. Assume that the observations X1, . . . , Xn are generatedby a strictly stationary nonanticipative solution of the ARMA(P,Q)-GARCH(p, q) model

Xt − c0 =P∑i=1

a0i (Xt−i − c0)+ et −Q∑j=1

b0j et−j

et =√htηt

ht = ω0 +q∑i=1

α0ie2t−i +

p∑j=1

β0jht−j ,

(7.21)

Page 167: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 151

where (ηt ) and the coefficients ω0, α0i and β0j are defined as in (7.1). The orders P,Q, p, q areassumed known. The vector of the parameters is denoted by

ϕ = (ϑ ′, θ ′)′ = (c, a1, . . . aP , b1, . . . , bQ, θ′)′,

where θ is defined as previously (see (7.2)). The parameter space is

� ⊂ RP+Q+1 × (0,+∞)× [0,∞)p+q .

The true value of the parameter is denoted by

ϕ0 = (ϑ ′0, θ′0)′ = (c0, a01, . . . a0P , b01, . . . , b0Q, θ

′0)′.

We still employ a Gaussian quasi-likelihood conditional on initial values. If q ≥ Q, the initialvalues are

X0, . . . , X1−(q−Q)−P , ε−q+Q, . . . , ε1−q , σ 20 , . . . , σ

21−p.

These values (the last p of which are positive) may depend on the parameter and/or on theobservations. For any ϑ , the values of εt (ϑ), for t = −q +Q+ 1, . . . , n, and then, for any θ , thevalues of σ 2

t (ϕ), for t = 1, . . . , n, can thus be computed fromεt = εt (ϑ) = Xt − c −

P∑i=1

ai(Xt−i − c)+Q∑j=1

bj εt−j

σ 2t = σ 2

t (ϕ) = ω +q∑i=1

αi ε2t−i +

p∑j=1

βj σ2t−j .

(7.22)

When q < Q, the fixed initial values are

X0, . . . , X1−(q−Q)−P , ε0, . . . , ε1−Q, σ 20 , . . . , σ

21−p.

Conditionally on these initial values, the Gaussian log-likelihood is given by

ln(ϕ) = n−1n∑t=1

t , t = t (φ) = ε2t (ϑ)

σ 2t (ϕ)

+ log σ 2t (ϕ).

A QMLE is defined as a measurable solution of the equation

ϕn = arg minϕ∈�

ln(ϕ).

Strong Consistency

Let aϑ(z) = 1−∑Pi=1 aiz

i and bϑ(z) = 1−∑Qj=1 bj z

j . Standard assumptions are made on theseAR and MA polynomials, and assumption A1 is modified as follows:

A7: ϕ0 ∈ � and � is compact.

A8: For all ϕ ∈ �, aϑ(z)bϑ(z) = 0 implies |z|> 1.

A9: aϑ0(z) and bϑ0(z) have no common roots, a0P = 0 or b0Q = 0.

Page 168: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

152 GARCH MODELS

Under assumptions A2 and A8, (Xt) is supposed to be the unique strictly stationary nonanticipativesolution of (7.21). Let εt = εt (ϑ) = aϑ(B)b

−1ϑ (B)(Xt − c) and t = t (ϕ) = ε2

t /σ2t + log σ 2

t ,where σ 2

t = σ 2t (ϕ) is the nonanticipative and ergodic strictly stationary solution of (7.10). Note

that et = εt (ϑ0) and ht = σ 2t (ϕ0). The following result extends Theorem 7.1.

Theorem 7.4 (Consistency of the QMLE) Let (ϕn) be a sequence of QMLEs satisfying (7.2).Assume that Eηt = 0. Then, under assumptions A2–A4 and A7–A9, almost surely

ϕn → ϕ0, as n→∞.

Remark 7.3

1. As in the pure GARCH case, the theorem does not impose a finite variance for et (andthus for Xt ). In the pure ARMA case, where et = ηt admits a finite variance, this theoremreduces to a standard result concerning ARMA models with iid errors (see Brockwell andDavis, 1991, p. 384).

2. Apart from the condition Eηt = 0, the conditions required for the strong consistency of theQMLE are not stronger than in the pure GARCH case.

Asymptotic Normality When the Moment of Order 4 Exists

So far, the asymptotic results of the QMLE (consistency and asymptotic normality in the pureGARCH case, consistency in the ARMA-GARCH case) have not required any moment assumptionon the observed process (for the asymptotic normality in the pure GARCH case, a moment oforder 4 is assumed for the iid process, not for εt ). One might think that this will be the same forestablishing the asymptotic normality in the ARMA-GARCH case. The following example showsthat this is not the case.

Example 7.2 (Nonexistence of J without moment assumption) Consider the AR(1)-ARCH(1)model

Xt = a01Xt−1 + et , et =√htηt , ht = ω0 + α0e

2t−1 (7.23)

where |a01| < 1, ω0 > 0, α0 ≥ 0, and the distribution of the iid sequence (ηt ) is defined,for a > 1, by

P(ηt = a) = P(ηt = −a) = 1

2a2, P(ηt = 0) = 1− 1

a2.

Then the process (Xt ) is always stationary, for any value of α0 (because exp{−E(log η2

t )} = +∞;

see the strict stationarity constraint (2.10)). By contrast, Xt does not admit a moment of order 2when α0 ≥ 1 (see Theorem 2.2). The first component of the (normalized) score vector is

∂ t (θ0)

∂a1=(

1− e2t

σ 2t

)(1

ht

∂σ 2t (θ0)

∂a1

)+ 2et

ht

∂εt (θ0)

∂a1

= −2α0(1− η2

t

) (et−1Xt−2

ht

)− 2

ηtXt−1√ht

.

Page 169: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 153

We have

E

{α0(1− η2

t

) (et−1Xt−2

ht

)+ ηtXt−1√

ht

}2

≥ E

[{α0(1− η2

t

) (et−1Xt−2

ht

)+ ηtXt−1√

ht

}2∣∣∣∣∣ ηt−1 = 0

]P(ηt−1 = 0)

= a201

ω0

(1− 1

a2

)E(X2t−2

)since, first, ηt−1 = 0 entails εt−1 = 0 and Xt−1 = a01Xt−2, and second, ηt−1 and Xt−2 are inde-pendent. Consequently, if EX2

t = ∞ and a01 = 0, the score vector does not admit a variance.

This example shows that it is not possible to extend the result of asymptotic normality obtainedin the GARCH case to the ARMA-GARCH models without additional moment assumptions. Thisis not surprising because for ARMA models (which can be viewed as limits of ARMA-GARCHmodels when the coefficients α0i and β0j tend to 0) the asymptotic normality of the QMLE isshown with second-order moment assumptions. For an ARMA with infinite variance innovations,the consistency of the estimators may be faster than in the standard case and the asymptoticdistribution is stable, but non-Gaussian in general. We show the asymptotic normality with amoment assumption of order 4. Recall that, by Theorem 2.9, this assumption is equivalent toρ {E(A0t ⊗ A0t )} < 1. We make the following assumptions:

A10: ρ {E(A0t ⊗ A0t )} < 1 and, for all θ ∈ �,∑p

j=1 βj < 1.

A11: ϕ0 ∈◦�, where

◦� denotes the interior of �.

A12: There exists no set ! of cardinality 2 such that P(ηt ∈ !) = 1.

Assumption A10 implies that κη = E(η4t ) <∞ and makes assumption A2 superfluous. The

identifiability assumption A12 is slightly stronger than the first part of assumption A3 whenthe distribution of ηt is not symmetric. We are now in a position to state conditions ensuringthe asymptotic normality of the QMLE of an ARMA-GARCH model.

Theorem 7.5 (Asymptotic normality of the QMLE) Assume that Eηt = 0 and that assump-tions A3, A4 and A8–A12 hold true. Then

√n(ϕn − ϕ0)

L→ N(0, ),

where = J−1IJ−1,

I = Eϕ0

(∂ t (ϕ0)

∂ϕ

∂ t (ϕ0)

∂ϕ′

), J = Eϕ0

(∂2 t (ϕ0)

∂ϕ∂ϕ′

).

If, in addition, the distribution of ηt is symmetric, we have

I =(I1 00 I2

), J =

(J1 00 J2

),

Page 170: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

154 GARCH MODELS

with

I1 = (κη − 1)Eϕ0

(1

σ 4t

∂σ 2t

∂ϑ

∂σ 2t

∂ϑ ′(ϕ0)

)+ 4Eϕ0

(1

σ 2t

∂εt

∂ϑ

∂εt

∂ϑ ′(ϕ0)

),

I2 = (κη − 1)Eϕ0

(1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(ϕ0)

),

J1 = Eϕ0

(1

σ 4t

∂σ 2t

∂ϑ

∂σ 2t

∂ϑ ′(ϕ0)

)+ Eϕ0

(2

σ 2t

∂εt

∂ϑ

∂εt

∂ϑ ′(ϕ0)

),

J2 = Eϕ0

(1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(ϕ0)

).

Remark 7.4

1. It is interesting to note that if ηt has a symmetric law, then the asymptotic variance isblock-diagonal, which is interpreted as an asymptotic independence between the estimatorsof the ARMA coefficients and those of the GARCH coefficients. The asymptotic distributionof the estimators of the ARMA coefficients depends, however, on the GARCH coefficients(in view of the form of the matrices I1 and J1 involving the derivatives of σ 2

t ). On theother hand, still when the distribution of ηt is symmetric, the asymptotic accuracy of theestimation of the GARCH parameters is not affected by the ARMA part: the lower leftblock J−1

2 I2J−12 of depends only on the GARCH coefficients. The block-diagonal form

of may also be of interest for testing problems of joint assumptions on the ARMA andGARCH parameters.

2. Assumption A11 imposes the strict positivity of the GARCH coefficients and it is easy tosee that this assumption constrains only the GARCH coefficients. For any value of ϑ0, therestriction of � to its first P +Q+ 1 coordinates can be chosen sufficiently large so thatits interior contains ϑ0 and assumption A8 is satisfied.

3. In the proof of the theorem, the symmetry of the iid process distribution is used to showthe following result, which is of independent interest.

If the distribution of ηt is symmetric then,

∀j, E{g(ε2

t , ε2t−1, . . . )εt−j f (εt−j−1, εt−j−2, . . . )

} = 0, (7.24)

provided this expectation exists (see Exercise 7.1).

Example 7.3 (Numerical evaluation of the asymptotic variance) Consider the AR(1)-ARCH(1) model defined by (7.23). In the case where ηt follows the N(0, 1) law, condition A10for the existence of a moment of order 4 is written as 3α2

0 < 1, that is, α0 < 0.577 (see (2.54)).In the case where ηt follows the χ2(1) distribution, normalized in such a way that Eηt = 0 andEη2

t = 1, this condition is written as 15α20 < 1, that is, α0 < 0.258. To simplify the computation,

assume that ω0 = 1 is known. Table 7.3 provides a numerical evaluation of the asymptoticvariance , for these two distributions and for different values of the parameters a0 and α0. Itis clear that the asymptotic variance of the two parameters strongly depends on the distributionof the iid process. These experiments confirm the independence of the asymptotic distributions ofthe AR and ARCH parameters in the case where the distribution of ηt is symmetric. They revealthat the independence does not hold when this assumption is relaxed. Note the strong impactof the ARCH coefficient on the asymptotic variance of the AR coefficient. On the other hand,the simulations confirm that in the case where the distribution is symmetric, the AR coefficienthas no impact on the asymptotic accuracy of the ARCH coefficient. When the distribution is not

Page 171: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 155

Table 7.3 Matrices of asymptotic variance of the estimator of (a0, α0) for an AR(1)-ARCH(1), when ω0 = 1 is known and the distribution of ηt is N(0, 1) or normalized χ2(1).

α0 = 0 α0 = 0.1 α0 = 0.25 α0 = 0.5

a0 = 0

ηt ∼ N(0, 1)

(1.00 0.000.00 0.67

) (1.14 0.000.00 1.15

) (1.20 0.000.00 1.82

) (1.08 0.000.00 2.99

)

ηt ∼ χ2(1)

(1.00 −0.54

−0.54 0.94

) (1.70 −1.63

−1.63 8.01

) (2.78 −1.51−1.51 18.78

)–

a0 = −0.5

ηt ∼ N(0, 1)

(0.75 0.000.00 0.67

) (0.82 0.000.00 1.15

) (0.83 0.000.00 1.82

) (0.72 0.000.00 2.99

)

ηt ∼ χ2(1)

(0.75 −0.40

−0.40 0.94

) (1.04 −0.99

−0.99 8.02

) (1.41 −0.78−0.78 18.85

)–

a0 = −0.9

ηt ∼ N(0, 1)

(0.19 0.000.00 0.67

) (0.19 0.000.00 1.15

) (0.18 0.000.00 1.82

) (0.13 0.000.00 2.98

)

ηt ∼ χ2(1)

(0.19 −0.10

−0.10 0.94

) (0.20 −0.19

−0.19 8.01

) (0.21 −0.12−0.12 18.90

)–

symmetric, the impact, if there is any, is very weak. For the computation of the expectationsinvolved in the matrix , see Exercise 7.8. In particular, the values corresponding to α0 = 0(AR(1) without ARCH effect) can be analytically computed. Note also that the results obtainedfor the asymptotic variance of the estimator of the ARCH coefficient in the case a0 = 0 do notcoincide with those of Table 7.2. This is not surprising because in this table ω0 is not supposedto be known.

7.3 Application to Real Data

In this section, we employ the QML method to estimate GARCH(1, 1) models on daily returns of11 stock market indices, namely the CAC, DAX, DJA, DJI, DJT, DJU, FTSE, Nasdaq, Nikkei,SMI and S&P 500 indices. The observations cover the period from January 2, 1990 to January 22,20091 (except for those indices for which the first observation is after 1990). The GARCH(1, 1)model has been chosen because it constitutes the reference model, by far the most commonly usedin empirical studies. However, in Chapter 8 we will see that it can be worth considering modelswith higher orders p and q.

Table 7.4 displays the estimators of the parameters ω, α, β, together with their estimatedstandard deviations. The last column gives estimates of ρ4 = (α + β)2 + (Eη4

0 − 1)α2, obtainedby replacing the unknown parameters by their estimates and Eη4

0 by the empirical mean of thefourth-order moment of the standardized residuals. We have Eε4

0 <∞ if and only if ρ4 < 1. The

1 For the Nasdaq an outlier has been eliminated because the base price was reset on the trading dayfollowing December 31, 1993.

Page 172: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

156 GARCH MODELS

Table 7.4 GARCH(1, 1) models estimated by QML for 11 indices. The estimated standard

deviations are given in parentheses. ρ4 = (α + β)2 + (Eη40 − 1)α2.

Index ω α β ρ4

CAC 0.033 (0.009) 0.090 (0.014) 0.893 (0.015) 1.0067DAX 0.037 (0.014) 0.093 (0.023) 0.888 (0.024) 1.0622DJA 0.019 (0.005) 0.088 (0.014) 0.894 (0.014) 0.9981DJI 0.017 (0.004) 0.085 (0.013) 0.901 (0.013) 1.0020DJT 0.040 (0.013) 0.089 (0.016) 0.894 (0.018) 1.0183DJU 0.021 (0.005) 0.118 (0.016) 0.865 (0.014) 1.0152FTSE 0.013 (0.004) 0.091 (0.014) 0.899 (0.014) 1.0228Nasdaq 0.025 (0.006) 0.072 (0.009) 0.922 (0.009) 1.0021Nikkei 0.053 (0.012) 0.100 (0.013) 0.880 (0.014) 0.9985SMI 0.049 (0.014) 0.127 (0.028) 0.835 (0.029) 1.0672S&P 500 0.014 (0.004) 0.084 (0.012) 0.905 (0.012) 1.0072

estimates of the GARCH coefficients are quite homogenous over all the series, and are similar tothose usually obtained in empirical studies of daily returns. The coefficients α are close to 0.1,and the coefficients β are close to 0.9, which indicates a strong persistence of the shocks on thevolatility. The sum α + β is greater than 0.98 for 10 of the 11 series, and greater than 0.96 forall the series. Since α + β < 1, the assumption of second-order stationarity cannot be rejected, forany series (see Section 8.1). A fortiori, by Remark 2.6 the strict stationarity cannot be rejected.Note that the strict stationarity assumption, E log(α1η

2t + β1) < 0, seems difficult to test directly

because it not only relies on the GARCH coefficients but also involves the unknown distributionof ηt . The existence of moments of order 4, Eε4

t <∞, is questionable for all the series because(α + β)2 + (Eη4

0 − 1)α2 is extremely close to 1. Recall, however, that the asymptotic propertiesof the QML do not require any moment on the observed process but do require strict stationarity.

7.4 Proofs of the Asymptotic Results*

We denote by K and ρ generic constants whose values can vary from line to line. As an example,one can write for 0 < ρ1 < 1 and 0 < ρ2 < 1, i1 ≥ 0, i2 ≥ 0,

0 < K∑i≥i1

ρi1 +K∑i≥i2

iρi2 ≤ Kρmin(i1,i2).

Proof of Theorem 7.1The proof is based on a vectorial autoregressive representation of order 1 of the vector σ 2

t =(σ 2t , σ

2t−1, . . . , σ

2t−p+1

), analogous to that used for the study of stationarity. Assumption A2 allows

us to write σ 2t as a series depending on the infinite past of the variable ε2

t . It can be shown thatthe initial values are not important asymptotically, using the fact that, under the strict stationarityassumption, ε2

t necessarily admits a moment order s, with s > 0. This property also allows us toshow that the expectation of t (θ0) is well defined in R and that Eθ0( t (θ))− Eθ0( t (θ0)) ≥ 0,which guarantees that the limit criterion is minimized at the true value. The difficulty is thatEθ0(

+t (θ)) can be equal to +∞. Assumptions A3 and A4 are crucial to establishing the identifia-

bility: the former assumption precludes the existence of a constant linear combination of the ε2t−j ,

j ≥ 0. The assumption of absence of common root is also used. The ergodicity of t (θ) and acompactness argument conclude the proof.

Page 173: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 157

It will be convenient to rewrite (7.10) in matrix form. We have

σ 2t = ct + Bσ 2

t−1, (7.25)

where

σ 2t =

σ 2t

σ 2t−1...

σ 2t−p+1

, ct =

ω +

∑q

i=1αiε

2t−i

0...

0

, B =

β1 β2 · · · βp1 0 · · · 0...

0 · · · 1 0

. (7.26)

We will establish the following intermediate results.

(a) limn→∞ supθ∈� |ln(θ)− ln(θ)| = 0, a.s.

(b) (∃t ∈ Z such that σ 2t (θ) = σ 2

t (θ0)Pθ0 a.s.) �⇒ θ = θ0.

(c) Eθ0 | t (θ0)| <∞, and if θ = θ0, Eθ0 t (θ)>Eθ0 t (θ0).

(d) For any θ = θ0, there exists a neighborhood V (θ) such that

lim infn→∞ inf

θ∗∈V (θ)ln(θ∗)>Eθ0 1(θ0), a.s.

(a) Asymptotic irrelevance of the initial values. In view of Corollary 2.2, the condition∑p

j=1 βj < 1 of assumption A2 implies that ρ(B) < 1. The compactness of � implies that

supθ∈�

ρ(B) < 1. (7.27)

Iterating (7.25), we thus obtain

σ 2t = ct + Bct−1 + B2ct−2 + · · · + Bt−1c1 + Btσ 2

0 =∞∑k=0

Bkct−k. (7.28)

Let σ 2t be the vector obtained by replacing σ 2

t−i by σ 2t−i in σ 2

t , and let ct be the vector obtainedby replacing ε2

0 , . . . , ε21−q by the initial values (7.6) or (7.7). We have

σ 2t = ct + Bct−1 + · · · + Bt−q−1cq+1 + Bt−q cq + · · · + Bt−1c1 + Bt σ 2

0. (7.29)

From (7.27), it follows that almost surely

supθ∈�

‖σ 2t − σ 2

t ‖ = supθ∈�

∥∥∥∥∥{

q∑k=1

Bt−k (ck − ck)+ Bt

(σ 2

0 − σ 20

)}∥∥∥∥∥≤ Kρt , ∀t. (7.30)

For x > 0 we have log x ≤ x − 1. It follows that, for x, y > 0,∣∣∣log x

y

∣∣∣ ≤ |x−y|min(x,y) . We thus have

almost surely, using (7.30),

supθ∈�

|ln(θ)− ln(θ)| ≤ n−1n∑t=1

supθ∈�

{∣∣∣∣ σ 2t − σ 2

t

σ 2t σ

2t

∣∣∣∣ ε2t +

∣∣∣∣log

(σ 2t

σ 2t

)∣∣∣∣}

≤{

supθ∈�

1

ω2

}Kn−1

n∑t=1

ρt ε2t +

{supθ∈�

1

ω

}Kn−1

n∑t=1

ρt . (7.31)

Page 174: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

158 GARCH MODELS

The existence of a moment of order s > 0 for ε2t , deduced from assumption A1 and Corollary 2.3,

allows us to show that ρt ε2t → 0 a.s. (see Exercise 7.2). Using Cesaro’s lemma, point (a) follows.

(b) Identifiability of the parameter. Assume that σ 2t (θ) = σ 2

t (θ0), Pθ0 a.s. By Corollary 2.2, thepolynomial Bθ (B) is invertible under assumption A2. Using (7.10), we obtain{Aθ (B)

Bθ (B)− Aθ0(B)

Bθ0(B)

}ε2t =

ω0

Bθ0 (1)− ω

Bθ (1) a.s. ∀t.

If the operator in B between braces were not null, then there would exist a constant linear combi-nation of the ε2

t−j , j ≥ 0. Thus the linear innovation of the process (ε2t ) would be equal to zero.

Since the distribution of η2t is nondegenerate, in view of assumption A3,

ε2t − Eθ0 (ε

2t |ε2

t−1, . . . ) = σ 2t (θ0)(η

2t − 1) = 0, with positive probability.

We thus have

Aθ (z)

Bθ (z)= Aθ0 (z)

Bθ0 (z), ∀|z| ≤ 1 and

ω

Bθ (1)= ω0

Bθ0(1). (7.32)

Under assumption A4 (absence of common root), it follows that Aθ (z) = Aθ0 (z), Bθ (z) = Bθ0 (z)

and ω = ω0. We have thus shown (b).

(c) The limit criterion is minimized at the true value. The limit criterion is not integrableat any point, but Eθ0 ln(θ) = Eθ0 t (θ) is well defined in R ∪ {+∞} because, with the notationx− = max(−x, 0) and x+ = max(x, 0),

Eθ0 −t (θ) ≤ Eθ0 log− σ 2

t ≤ max{0,− logω} <∞.2

It is, however, possible to have Eθ0 t (θ) = ∞ for some values of θ . This occurs, for instance,when θ = (ω, 0, . . . , 0) and (εt ) is an IGARCH such that Eθ0ε

2t = ∞. We will see that this cannot

occur at θ0, meaning that the criterion is integrable at θ0. To establish this result, we have to showthat Eθ0

+t (θ0) <∞. Using Jensen’s inequality and, once again, the existence of a moment of

order s > 0 for ε2t , we obtain

Eθ0 log+ σ 2t (θ0) <∞

becauseEθ0 log σ 2

t (θ0) = Eθ0

1

slog{σ 2t (θ0)

}s ≤ 1

slogEθ0

{σ 2t (θ0)

}s<∞.

Thus

Eθ0 t (θ0) = Eθ0

{σ 2t (θ0)η

2t

σ 2t (θ0)

+ log σ 2t (θ0)

}= 1+ Eθ0 log σ 2

t (θ0) <∞.

Having already established that Eθ0 −t (θ0) <∞, it follows that Eθ0 t (θ0) is well defined in R.

Since for all x > 0, log x ≤ x − 1 with equality if and only if x = 1, we have

Eθ0 t (θ)− Eθ0 t (θ0) = Eθ0 logσ 2t (θ)

σ 2t (θ0)

+ Eθ0

σ 2t (θ0)η

2t

σ 2t (θ)

− Eθ0η2t

= Eθ0 logσ 2t (θ)

σ 2t (θ0)

+ Eθ0

σ 2t (θ0)

σ 2t (θ)

− 1

≥ Eθ0

{log

σ 2t (θ)

σ 2t (θ0)

+ logσ 2t (θ0)

σ 2t (θ)

}= 0 (7.33)

2 We use here the fact that (f + g)− ≤ g− for f ≥ 0, and that if f ≤ g then f− ≥ g−.

Page 175: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 159

with equality if and only if σ 2t (θ0)/σ

2t (θ) = 1 Pθ0 -a.s., that is, in view of (b), if and only

if θ = θ0.3

(d) Compactness of � and ergodicity of (�t (θ)). For all θ ∈ � and any positive integer k, letVk(θ) be the open ball of center θ and radius 1/k. Because of (a), we have

lim infn→∞ inf

θ∗∈Vk(θ)∩�ln(θ∗) ≥ lim inf

n→∞ infθ∗∈Vk(θ)∩�

ln(θ∗)− lim supn→∞

supθ∈�

|ln(θ)− ln(θ)|

≥ lim infn→∞ n−1

n∑t=1

infθ∗∈Vk(θ)∩�

t (θ∗).

To obtain the convergence of this empirical mean, the standard ergodic theorem cannot be applied(see Theorem A.2) because we have seen that t (θ∗) is not necessarily integrable, except at θ0.We thus use a modified version of this theorem, which allows for an ergodic and strictly stationarysequence of variables admitting an expectation in R ∪ {+∞} (see Exercise 7.3). This version of theergodic theorem can be applied to { t (θ∗)}, and thus to {infθ∗∈Vk(θ)∩� t (θ

∗)} (see Exercise 7.4),which allows us to conclude that

lim infn→∞ n−1

n∑t=1

infθ∗∈Vk(θ)∩�

t (θ∗) = Eθ0 inf

θ∗∈Vk(θ)∩� 1(θ

∗).

By Beppo Levi’s theorem, Eθ0 infθ∗∈Vk(θ)∩� 1(θ∗) increases to Eθ0 1(θ) as k→∞. Given (7.33),

we have shown (d).The conclusion of the proof uses a compactness argument. First note that for any neighborhood

V (θ0) of θ0,

lim supn→∞

infθ∗∈V (θ0)

ln(θ∗) ≤ limn→∞ ln(θ0) = lim

n→∞ ln(θ0) = Eθ0 1(θ0). (7.34)

The compact set � is covered by the union of an arbitrary neighborhood V (θ0) of θ0 and the setof the neighborhoods V (θ) satisfying (d), θ ∈ � \ V (θ0). Thus, there exists a finite subcover of �of the form V (θ0), V (θ1), . . . , V (θk), where, for i = 1, . . . , k, V (θi) satisfies (d). It follows that

infθ∈�

ln(θ) = mini=0,1,...,k

infθ∈�∩V (θi )

ln(θ).

The relations (d) and (7.34) show that, almost surely, θn belongs to V (θ0) for n large enough.Since this is true for any neighborhood V (θ0), the proof is complete.

Proof of Theorem 7.2The proof of this theorem is based on a standard Taylor expansion of criterion (7.8) at θ0. Since θnconverges to θ0, which lies in the interior of the parameter space by assumption A5, the derivativeof the criterion is equal to zero at θn. We thus have

0 = n−1/2n∑t=1

∂θ t (θn)

= n−1/2n∑t=1

∂θ t (θ0)+

(1

n

n∑t=1

∂2

∂θi∂θj t (θ

∗ij )

)√n(θn − θ0

)(7.35)

3 To show (7.33) it can be assumed that Eθ0 | log σ 2t (θ)| <∞ and that Eθ0 |ε2

t /σ2t (θ)| <∞ (in order to use

the linearity property of the expectation), otherwise Eθ0 t (θ) = +∞ and the relation is trivially satisfied.

Page 176: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

160 GARCH MODELS

where the θ∗ij are between θn and θ0. It will be shown that

n−1/2n∑t=1

∂θ t (θ0)

L→ N (0, (κη − 1)J), (7.36)

and that

n−1n∑t=1

∂2

∂θi∂θj t (θ

∗ij )→ J (i, j) in probability. (7.37)

The proof of the theorem immediately follows. We will split the proof of (7.36) and (7.37) intoseveral parts:

(a) Eθ0

∥∥∥ ∂ t (θ0)∂θ

∂ t (θ0)∂θ ′

∥∥∥ <∞, Eθ0

∥∥∥ ∂2 t (θ0)∂θ∂θ ′

∥∥∥ <∞.

(b) J is invertible and Varθ0

{∂ t (θ0)∂θ

}= {κη − 1

}J .

(c) There exists a neighborhood V(θ0) of θ0 such that, for all i, j, k ∈ {1, . . . , p + q + 1},

Eθ0 supθ∈V(θ0)

∣∣∣∣ ∂3 t (θ)

∂θi∂θj ∂θk

∣∣∣∣ <∞.

(d)∥∥∥n−1/2∑n

t=1

{∂ t (θ0)∂θ

− ∂ t (θ0)∂θ

}∥∥∥ and supθ∈V(θ0)

∥∥∥n−1∑nt=1

{∂2 t (θ)∂θ∂θ ′ − ∂2 t (θ)

∂θ∂θ ′}∥∥∥ tend in

probability to 0 as n→∞.

(e) n−1/2∑nt=1

∂∂θ t (θ0)

L→ N (0, (κη − 1)J).

(f) n−1∑nt=1

∂2

∂θi ∂θj t (θ

∗ij )→ J (i, j) a.s.

(a) Integrability of the derivatives of the criterion at θ0. Since t (θ) = ε2t /σ

2t + log σ 2

t ,we have

∂ t (θ)

∂θ={

1− ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θ

}, (7.38)

∂2 t (θ)

∂θ∂θ ′={

1− ε2t

σ 2t

}{1

σ 2t

∂2σ 2t

∂θ∂θ ′

}+{

2ε2t

σ 2t

− 1

}{1

σ 2t

∂σ 2t

∂θ

}{1

σ 2t

∂σ 2t

∂θ ′

}. (7.39)

At θ = θ0, the variable ε2t /σ

2t = η2

t is independent of σ 2t and its derivatives. To show (a), it thus

suffices to show that

Eθ0

∥∥∥∥ 1

σ 2t

∂σ 2t

∂θ(θ0)

∥∥∥∥ <∞, Eθ0

∥∥∥∥ 1

σ 2t

∂2σ 2t

∂θ∂θ ′(θ0)

∥∥∥∥ <∞, Eθ0

∥∥∥∥ 1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0)

∥∥∥∥ <∞. (7.40)

In view of (7.28), we have

∂σ 2t

∂ω=

∞∑k=0

Bk1,∂σ 2

t

∂αi=

∞∑k=0

Bkε2t−k−i , (7.41)

∂σ 2t

∂βj=

∞∑k=1

{k∑i=1

Bi−1B(j)Bk−i}ct−k, (7.42)

Page 177: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 161

where 1 = (1, 0, . . . , 0)′, ε2t = (ε2

t , 0, . . . , 0)′, and B(j) is a p × p matrix with 1 in position (1, j)and zeros elsewhere. Note that, in view of the positivity of the coefficients and (7.41)–(7.42), thederivatives of σ 2

t are positive or null. In view of (7.41), it is clear that ∂σ 2t /∂ω is bounded. Since

σ 2t ≥ ω> 0, the variable {∂σ 2

t /∂ω}/σ 2t is also bounded. This variable thus possesses moments of

all orders. In view of the second equality in (7.41) and of the positivity of all the terms involvedin the sums, we have

αi∂σ 2

t

∂αi=

∞∑k=0

Bkαiε2t−k−i ≤

∞∑k=0

Bkct−k = σ 2t .

It follows that

1

σ 2t

∂σ 2t

∂αi≤ 1

αi. (7.43)

The variable σ−2t (∂σ 2

t /∂αi) thus admits moments of all orders at θ = θ0. In view of (7.42) andβjB

(j) ≤ B, we have

βj∂σ 2

t

∂βj≤

∞∑k=1

{k∑i=1

Bi−1BBk−i}ct−k =

∞∑k=1

kBkct−k. (7.44)

Using (7.27), we have ‖Bk‖ ≤ Kρk for all k. Moreover, ε2t having a moment of order s ∈ (0, 1), the

variable ct (1) = ω +∑q

i=1 αiε2t−i has the same moment.4 Using in addition (7.44), the inequality

σ 2t ≥ ω + Bk(1, 1)ct−k(1) and the relation x/(1+ x) ≤ xs for all x ≥ 0,5 we obtain

Eθ0

1

σ 2t

∂σ 2t

∂βj≤ Eθ0

1

βj

∞∑k=1

kBk(1, 1)ct−k(1)ω + Bk(1, 1)ct−k(1)

≤ 1

βj

∞∑k=1

kEθ0

{Bk(1, 1)ct−k(1)

ω

}s

≤ Ks

ωsβjEθ0

{ct−k(1)

}s ∞∑k=1

kρsk ≤ K

βj. (7.45)

Under assumption A5 we have β0j > 0 for all j , which entails that the first expectation in (7.40)exists.

We now turn to the higher-order derivatives of σ 2t . In view of the first equality of (7.41),

we have

∂2σ 2t

∂ω2= ∂2σ 2

t

∂ω∂αi= 0 and

∂2σ 2t

∂ω∂βj=

∞∑k=1

{k∑i=1

Bi−1B(j)Bk−i}

1. (7.46)

We thus have

βi∂2σ 2

t

∂ω∂βj≤

∞∑k=1

kBk1,

4 We use the inequality (a + b)s ≤ as + bs for all a, b ≥ 0 and any s ∈ (0, 1]. Indeed, xs ≥ x for allx ∈ [0, 1], and if a + b> 0,

(a

a+b)s + ( b

a+b)s ≥ a

a+b + ba+b = 1.

5 If x ≥ 1 then xs ≥ 1 ≥ x/(1+ x). If 0 ≤ x ≤ 1 then xs ≥ x ≥ x/(1+ x).

Page 178: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

162 GARCH MODELS

which is a vector of finite constants (since ρ(B) < 1). It follows that ∂2σ 2t (θ0)/∂ω∂θi is bounded,

and thus admits moments of all orders. It is of course the same for{∂2σ 2

t (θ0)/∂ω∂θi}/σ 2

t (θ0).

The second equality of (7.41) gives

∂2σ 2t

∂αi∂αj= 0 and

∂2σ 2t

∂αi∂βj=

∞∑k=1

{k∑i=1

Bi−1B(j)Bk−i}ε2t−k−i . (7.47)

The arguments used for (7.45) then show that

Eθ0

∂2σ 2t /∂αi∂βj

σ 2t

≤ K∗

βj.

This entails that{∂2σ 2

t (θ0)/∂αi∂θ}/σ 2

t (θ0) is integrable. Differentiating relation (7.42) with respectto βj ′ , we obtain

βjβj ′∂2σ 2

t

∂βj ∂βj ′= βjβj ′

∞∑k=2

[k∑i=2

{(i−1∑ =1

B −1B(j ′)Bi−1− )B(j)Bk−i

}

+k−1∑i=1

{Bi−1B(j)

(k−i∑ =1

B −1B(j ′)Bk−i− )}]

ct−k

≤∞∑k=2

[k∑i=2

(i − 1)Bk +k−1∑i=1

(k − i)Bk

]ct−k

=∞∑k=2

k(k − 1)Bkct−k (7.48)

because βjB(j) ≤ B. As for (7.45), it follows that

Eθ0

∂2σ 2t /∂βj ∂β

′j

σ 2t

≤ K∗

βjβj ′

and the existence of the second expectation in (7.40) is proven.Since

{∂σ 2

t /∂ω}/σ 2

t is bounded, and since by (7.43) the variables{∂σ 2

t /∂αi}/σ 2

t are boundedat θ0, it is clear that

Eθ0

∥∥∥∥ 1

σ 4t (θ0)

∂σ 2t (θ0)

∂θi

∂σ 2t (θ0)

∂θ

∥∥∥∥ <∞,

for i = 1, . . . , q + 1. With the notation and arguments already used to show (7.45), and using theelementary inequality x/(1+ x) ≤ xs/2 for all x ≥ 0, Minkowski’s inequality implies that{

Eθ0

(1

σ 2t (θ0)

∂σ 2t (θ0)

∂βj

)2}1/2

≤ 1

β0j

∞∑k=1

k

{Eθ0

(Bk(1, 1)ct−k(1)

ω0

)s}1/2

<∞.

Finally, the Cauchy–Schwarz inequality entails that the third expectation of (7.40) exists.

(b) Invertibility of J and connection with the variance of the criterion derivative. Using (a),and once again the independence between η2

t = ε2t /σ

2t (θ0) and σ 2

t and its derivatives, we haveby (7.38),

Eθ0

{∂ t (θ0)

∂θ

}= Eθ0 (1− η2

t )Eθ0

{1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ

}= 0.

Page 179: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 163

Moreover, in view of (7.40), J exists and satisfies (7.13). We also have

Varθ0

{∂ t (θ0)

∂θ

}= Eθ0

{∂ t (θ0)

∂θ

∂ t (θ0)

∂θ ′

}(7.49)

= E{(1− η2

t )2}Eθ0

{∂σ 2

t (θ0)/∂θ

σ 2t (θ0)

∂σ 2t (θ0)/∂θ

σ 2t (θ0)

}= {κη − 1

}J.

Assume now that J is singular. Then there exists a nonzero vector λ in Rp+q+1 such thatλ′{∂σ 2

t (θ0)/∂θ} = 0 a.s.6 In view of (7.10) and the stationarity of {∂σ 2

t (θ0)/∂θ}t , we have

0 = λ′∂σ 2

t (θ0)

∂θ= λ′

1ε2t−1...

ε2t−q

σ 2t−1(θ0)

...

σ 2t−p(θ0)

+

p∑j=1

βjλ′ ∂σ

2t−j (θ0)

∂θ= λ′

1ε2t−1...

ε2t−q

σ 2t−1(θ0)

...

σ 2t−p(θ0)

.

Let λ = (λ0, λ1, . . . , λq+p)′. It is clear that λ1 = 0, otherwise ε2t−1 would be measurable

with respect to the σ -field generated by {ηu, u < t − 1}. For the same reason, we haveλ2 = · · · = λ2+i = 0 if λq+1 = · · · = λq+i = 0. Consequently, λ = 0 implies the existence of aGARCH(p − 1, q − 1) representation. By the arguments used to show (7.32), assumption A4entails that this is impossible. It follows that λ′Jλ = 0 implies λ = 0, which completes theproof of (b).

(c) Uniform integrability of the third-order derivatives of the criterion. Differentiating (7.39),we obtain

∂3 t (θ)

∂θi∂θj ∂θk={

1− ε2t

σ 2t

}{1

σ 2t

∂3σ 2t

∂θi∂θj∂θk

}(7.50)

+{

2ε2t

σ 2t

− 1

}{1

σ 2t

∂σ 2t

∂θi

}{1

σ 2t

∂2σ 2t

∂θj ∂θk

}+{

2ε2t

σ 2t

− 1

}{1

σ 2t

∂σ 2t

∂θj

}{1

σ 2t

∂2σ 2t

∂θi∂θk

}+{

2ε2t

σ 2t

− 1

}{1

σ 2t

∂σ 2t

∂θk

}{1

σ 2t

∂2σ 2t

∂θi∂θj

}+{

2− 6ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θi

}{1

σ 2t

∂σ 2t

∂θj

}{1

σ 2t

∂σ 2t

∂θk

}.

We begin by studying the integrability of {1− ε2t /σ

2t }. This is the most difficult term to deal

with. Indeed, the variable ε2t /σ

2t is not uniformly integrable on �: at θ = (ω, 0′), the ratio ε2

t /σ2t is

6 We have

λ′Jλ = E

[1

σ 4t (θ0)

(λ′∂σ 2

t (θ0)

∂θ

)2]= 0

if and only if σ−2t (θ0)

(λ′∂σ 2

t (θ0)/∂θ)2 = 0 a.s., that is, if and only if

(λ′∂σ 2

t (θ0)/∂θ)2 = 0 a.s.

Page 180: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

164 GARCH MODELS

integrable only if Eε2t exists. We will, however, show the integrability of {1− ε2

t /σ2t } uniformly in

θ in the neighborhood of θ0. Let �∗ be a compact set which contains θ0 and which is contained inthe interior of � (∀θ ∈ �∗, we have θ ≥ θ∗> 0 component by component). Let B0 be the matrix B(defined in (7.26)) evaluated at the point θ = θ0. For all δ > 0, there exists a neighborhood V(θ0)

of θ0, included in �∗, such that for all θ ∈ V(θ0),

B0 ≤ (1+ δ)B (i.e. B0(i, j) ≤ (1+ δ)B(i, j) for all i and j ).

Note that, since V(θ0) ⊂ �∗, we have supθ∈V(θ0)1/αi <∞. From (7.28), we obtain

σ 2t = ω

∞∑k=0

Bk(1, 1)+q∑i=1

αi

{ ∞∑k=0

Bk(1, 1)ε2t−k−i

}

and, again using x/(1+ x) ≤ xs for all x ≥ 0 and all s ∈ (0, 1),

supθ∈V(θ0)

σ 2t (θ0)

σ 2t

≤ supθ∈V(θ0)

{ω0∑∞

k=0 Bk0 (1, 1)

ω+

q∑i=1

α0i

( ∞∑k=0

Bk0 (1, 1)ε2

t−k−iω + αiBk(1, 1)ε2

t−k−i

)}

≤ K +q∑i=1

supθ∈V(θ0)

{α0i

αi

∞∑k=0

Bk0 (1, 1)

Bk(1, 1)

(αiB

k(1, 1)ε2t−k−i

ω

)s}

≤ K +K

q∑i=1

∞∑k=0

(1+ δ)kρksε2st−k−i . (7.51)

If s is chosen such that Eε2st <∞ and, for instance, δ = (1− ρs)/(2ρs), then the expectation of

the previous series is finite. It follows that there exists a neighborhood V(θ0) of θ0 such that

Eθ0 supθ∈V(θ0)

ε2t

σ 2t

= Eθ0 supθ∈V(θ0)

σ 2t (θ0)

σ 2t

<∞.

Using (7.51), keeping the same choice of δ but taking s such that Eε4st <∞, the triangle inequality

gives ∥∥∥∥∥ supθ∈V(θ0)

ε2t

σ 2t

∥∥∥∥∥2

= κ1/2η

∥∥∥∥∥ supθ∈V(θ0)

σ 2t (θ0)

σ 2t

∥∥∥∥∥2

≤ κ1/2η K + κ1/2

η Kq

∞∑k=0

(1+ δ)kρks∥∥ε2s

t

∥∥2 <∞. (7.52)

Now consider the second term in braces in (7.50). Differentiating (7.46), (7.47) and (7.48),with the arguments used to show (7.43), we obtain

supθ∈�∗

1

σ 2t

∂3σ 2t

∂θi1∂θi2∂θi3≤ K,

when the indices i1, i2 and i3 are not all in {q + 1, q + 2, . . . , q + 1+ p} (that is, when thederivative is taken with respect to at least one parameter different from the βj ). Using again

Page 181: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 165

the arguments used to show (7.44) and (7.48), and then (7.45), we obtain

βiβjβk∂3σ 2

t

∂βi∂βj ∂βk≤

∞∑k=3

k(k − 1)(k − 2)Bk(1, 1)ct−k(1),

supθ∈�∗

1

σ 2t

∂3σ 2t

∂βi∂βj ∂βk≤ K

{supθ∈�∗

1

ωsβiβjβk

} ∞∑k=3

k(k − 1)(k − 2)ρks{

supθ∈�∗

ct−k(1)}s

for any s ∈ (0, 1). Since Eθ0

{supθ∈�∗ ct−k(1)

}2s<∞ for some s > 0, it follows that

Eθ0 supθ∈�∗

∣∣∣∣ 1

σ 2t

∂3σ 2t

∂θi∂θj ∂θk

∣∣∣∣2 <∞. (7.53)

It is easy to see that in this inequality the power 2 can be replaced by any power d:

Eθ0 supθ∈�∗

∣∣∣∣ 1

σ 2t

∂3σ 2t

∂θi∂θj ∂θk

∣∣∣∣d <∞.

Using the Cauchy–Schwarz inequality, (7.52) and (7.53), we obtain

Eθ0 supθ∈V(θ0)

∣∣∣∣{1− ε2t

σ 2t

}{1

σ 2t

∂3σ 2t

∂θi∂θj ∂θk

}∣∣∣∣ <∞.

The other terms in braces in (7.50) are handled similarly. We show in particular that

Eθ0 supθ∈�∗

∣∣∣∣ 1

σ 2t

∂2σ 2t

∂θi∂θj

∣∣∣∣d <∞, Eθ0 supθ∈�∗

∣∣∣∣ 1

σ 2t

∂σ 2t

∂θi

∣∣∣∣d <∞, (7.54)

for any integer d . With the aid of Holder’s inequality, this allows us to establish, in particular, that

Eθ0 supθ∈V(θ0)

∣∣∣∣{2− 6ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θi

}{1

σ 2t

∂σ 2t

∂θj

}{1

σ 2t

∂σ 2t

∂θk

}∣∣∣∣≤∥∥∥∥∥ supθ∈V(θ0)

∣∣∣∣2− 6ε2t

σ 2t

∣∣∣∣∥∥∥∥∥

2

maxi

∥∥∥∥ supθ∈�∗

∣∣∣∣ 1

σ 2t

∂σ 2t

∂θi

∣∣∣∣∥∥∥∥3

6

<∞.

Thus we obtain (c).

(d) Asymptotic decrease of the effect of the initial values. Using (7.29), we obtain the analogsof (7.41) and (7.42) for the derivatives of σ 2

t :

∂σ 2t

∂ω=

t−1−q∑k=0

Bk1+q∑

k=1

Bt−k ∂ck∂ω

+ Bt ∂σ20

∂ω, (7.55)

∂σ 2t

∂αi=

t−1−q∑k=0

Bkε2t−k−i +

q∑k=1

Bt−k ∂ck∂αi

+ Bt ∂σ20

∂αi, (7.56)

∂σ 2t

∂βj=

t−1−q∑k=1

{k∑i=1

Bi−1B(j)Bk−i}ct−k +

q∑k=1

{t−k∑i=1

Bi−1B(j)Bt−k−i}ck, (7.57)

Page 182: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

166 GARCH MODELS

where ∂σ 20/∂ω is equal to (0, . . . , 0)′ when the initial conditions are given by (7.7), and is equal

to (1, . . . , 1)′ when the initial conditions are given by (7.6). The second-order derivatives havesimilar expressions. The compactness of � and the fact that ρ(B) < 1 together allow us to claimthat, almost surely,

supθ∈�

∥∥∥∥∂σ 2t

∂θ− ∂σ 2

t

∂θ

∥∥∥∥ < Kρt, supθ∈�

∥∥∥∥ ∂2σ 2t

∂θ∂θ ′− ∂2σ 2

t

∂θ∂θ ′

∥∥∥∥ < Kρt , ∀t. (7.58)

Using (7.30), we obtain

∣∣∣∣ 1

σ 2t

− 1

σ 2t

∣∣∣∣ = ∣∣∣∣ σ 2t − σ 2

t

σ 2t σ

2t

∣∣∣∣ ≤ Kρt

σ 2t

,σ 2t

σ 2t

≤ 1+Kρt . (7.59)

Since

∂ t (θ)

∂θ={

1− ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θ

}and

∂ t (θ)

∂θ={

1− ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θ

},

we have, using (7.59) and the first inequality in (7.58),

∣∣∣∣∂ t (θ0)

∂θi− ∂ t (θ0)

∂θi

∣∣∣∣ = ∣∣∣∣{ ε2t

σ 2t

− ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θi

}+{

1− ε2t

σ 2t

}{1

σ 2t

− 1

σ 2t

}{∂σ 2

t

∂θi

}

+{

1− ε2t

σ 2t

}{1

σ 2t

}{∂σ 2

t

∂θi− ∂σ 2

t

∂θi

}∣∣∣∣ (θ0)

≤ Kρt(1+ η2t )

∣∣∣∣1+ { 1

σ 2t (θ0)

∂σ 2t (θ0)

∂θi

}∣∣∣∣ .It follows that∣∣∣∣∣n−1/2

n∑t=1

{∂ t (θ0)

∂θi− ∂ t (θ0)

∂θi

}∣∣∣∣∣ ≤ K∗n−1/2n∑t=1

ρt (1+ η2t )

∣∣∣∣1+ 1

σ 2t (θ0)

∂σ 2t (θ0)

∂θi

∣∣∣∣ . (7.60)

Markov’s inequality, (7.40), and the independence between ηt and σ 2t (θ0) imply that, for

all ε > 0,

P

(n−1/2

n∑t=1

ρt (1+ η2t )

∣∣∣∣1+ 1

σ 2t (θ0)

∂σt2(θ0)

∂θ

∣∣∣∣>ε

)

≤ 2

ε

(1+ Eθ0

∣∣∣∣ 1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ

∣∣∣∣) n−1/2n∑t=1

ρt → 0,

which, by (7.60), shows the first part of (d).

Page 183: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 167

Now consider the asymptotic impact of the initial values on the second-order derivatives ofthe criterion in a neighborhood of θ0. In view of (7.39) and the previous computations, we have

supθ∈V(θ0)

∣∣∣∣∣n−1n∑t=1

{∂2 t (θ0)

∂θi∂θj− ∂2 t (θ0)

∂θi∂θj

}∣∣∣∣∣≤ n−1

n∑t=1

supθ∈V(θ0)

∣∣∣∣{ ε2t

σ 2t

− ε2t

σ 2t

}{1

σ 2t

∂2σ 2t

∂θi∂θj

}

+{

1− ε2t

σ 2t

}{(1

σ 2t

− 1

σ 2t

)∂2σ 2

t

∂θi∂θj+ 1

σ 2t

(∂2σ 2

t

∂θi∂θj− ∂2σ 2

t

∂θi∂θj

)}+{

2ε2t

σ 2t

− 2ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θi

}{1

σ 2t

∂σ 2t

∂θj

}+{

2ε2t

σ 2t

− 1

}{(1

σ 2t

− 1

σ 2t

)∂σ 2

t

∂θi+ 1

σ 2t

(∂σ 2

t

∂θi− ∂σ 2

t

∂θi

)}{1

σ 2t

∂σ 2t

∂θj

}+{

2ε2t

σ 2t

− 1

}{1

σ 2t

∂σ 2t

∂θi

}{(1

σ 2t

− 1

σ 2t

)∂σ 2

t

∂θj+ 1

σ 2t

(∂σ 2

t

∂θj− ∂σ 2

t

∂θj

)}∣∣∣∣≤ Kn−1

n∑t=1

ρtϒt,

where

ϒt = supθ∈V(θ0)

{1+ ε2

t

σ 2t

}{1+ 1

σ 2t

∂2σ 2t

∂θi∂θj+ 1

σ 2t

∂σ 2t

∂θi

1

σ 2t

∂σ 2t

∂θj

}.

In view of (7.52), (7.54) and Holder’s inequality, it can be seen that, for a certain neighborhoodV(θ0), the expectation of ϒt is a finite constant. Using Markov’s inequality once again, the secondconvergence of (d) is then shown.

(e) CLT for martingale increments. The conditional score vector is obviously centered, whichcan be seen from (7.38), using the fact that σ 2

t (θ0) and its derivatives belong to the σ -field generatedby {εt−i , i ≥ 0}, and the fact that Eθ0

(ε2t |εu, u < t

) = σ 2t (θ0):

Eθ0

(∂

∂θ t (θ0) | εu, u < t

)= 1

σ 4t (θ0)

(∂

∂θσ 2t (θ0)

)Eθ0

(σ 2t (θ0)− ε2

t | εu, u < t) = 0.

Note also that, by (7.49), Varθ0 (∂ t (θ0)/∂θ) is finite. In view of the invertibility of J and theassumptions on the distribution of ηt (which entail 0 < κη − 1 <∞), this covariance matrix isnondegenerate. It follows that, for all λ ∈ Rp+q+1, the sequence

{λ′ ∂

∂θ t (θ0), εt

}t

is a squareintegrable ergodic stationary martingale difference. Corollary A.1 and the Cramer–Wold theorem(see, for example, Billingsley, 1995, pp. 383, 476 and 360) entail (e).

(f) Use of a second Taylor expansion and of the ergodic theorem. Consider the Taylor expansion(7.35) of the criterion at θ0. We have, for all i and j ,

n−1n∑t=1

∂2

∂θi∂θj t (θ

∗ij ) = n−1

n∑t=1

∂2

∂θi∂θj t (θ0)+ n−1

n∑t=1

∂θ ′

{∂2

∂θi∂θj t (θij )

} (θ∗ij − θ0

),

(7.61)

Page 184: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

168 GARCH MODELS

where θij is between θ∗ij and θ0. The almost sure convergence of θij to θ0, the ergodic theoremand (c) imply that almost surely

lim supn→∞

∥∥∥∥∥n−1n∑t=1

∂θ ′

{∂2

∂θi∂θj t (θij )

}∥∥∥∥∥ ≤ lim supn→∞

n−1n∑t=1

supθ∈V(θ0)

∥∥∥∥ ∂

∂θ ′

{∂2

∂θi∂θj t (θ)

}∥∥∥∥= Eθ0 sup

θ∈V(θ0)

∥∥∥∥ ∂

∂θ ′

{∂2

∂θi∂θj t (θ)

}∥∥∥∥ <∞.

Since ‖θ∗ij − θ0‖ → 0 almost surely, the second term on the right-hand side of (7.61) convergesto 0 with probability 1. By the ergodic theorem, the first term on the right-hand side of (7.61)converges to J (i, j).

To complete the proof of Theorem 7.2, it suffices to apply Slutsky’s lemma. In view of (d),(e) and (f) we obtain (7.36) and (7.37).

Proof of the Results of Section 7.1.3

Proof of Lemma 7.1. We have

ρnhn = ρnω0

{1+

n−1∑t=1

αt0η2n−1 . . . η

2n−t

}+ ρnαn0η

2n−1 . . . η

21ε

20

≥ ρnω0

n−1∏t=1

α0η2t .

Thus

lim infn→∞

1

nlog ρnhn ≥ lim

n→∞1

n

{log ρω0 +

n−1∑t=1

log ρα0η2t

}= E log ρα0η

21 > 0,

using (7.18) for the latter inequality. It follows that log ρnhn, and thus ρnhn, tend almost surelyto +∞ as n→∞. Now if ρnhn →+∞ and ρnε2

n = ρnhnη2n → +∞, then for any ε > 0, the

sequence (η2n) admits an infinite number of terms less than ε. Since the sequence (η2

n) is ergodicand stationary, we have P(η2

1 < ε)> 0. Since ε is arbitrary, we have P(η21 = 0)> 0, which is in

contradiction to (7.16). �

Proof of (7.19). Note that(ωn, αn) = arg min

θ∈�Qn(θ),

where

Qn(θ) = 1

n

n∑t=1

{ t (θ)− t (θ0)} .

We have

Qn(θ) = 1

n

n∑t=1

η2t

{σ 2t (θ0)

σ 2t (θ)

− 1

}+ log

σ 2t (θ)

σ 2t (θ0)

= 1

n

n∑t=1

η2t

(ω0 − ω)+ (α0 − α)ε2t−1

ω + αε2t−1

+ logω + αε2

t−1

ω0 + α0ε2t−1

.

Page 185: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 169

For all θ ∈ �, we have α = 0. Letting

On(α) = 1

n

n∑t=1

η2t

(α0 − α)

α+ log

α

α0

and

dt = α(ω0 − ω)− ω(α0 − α)

α(ω + αε2t−1)

,

we have

Qn(θ)−On(α) = 1

n

n∑t=1

η2t dt−1 + 1

n

n∑t=1

log(ω + αε2

t−1)α0

(ω0 + α0ε2t−1)α

→ 0 a.s.

since, by Lemma 7.1, ε2t →∞ almost surely as t →∞. It is easy to see that this convergence is

uniform on the compact set �:

limn→∞ sup

θ∈�|Qn(θ)−On(α)| = 0 a.s. (7.62)

Let α−0 and α+0 be two constants such that α−0 < α0 < α+0 . It can always be assumed that 0 < α−0 .With the notation σ 2

η = n−1∑nt=1 η

2t , the solution of

α∗n = arg minαOn(α)

is α∗n = α0σ2η . This solution belongs to the interval (α−0 , α

+0 ) when n is large enough. In this case

α∗∗n = arg minα ∈(α−0 ,α+0 )

On(α)

is one of the two extremities of the interval (α−0 , α+0 ), and thus

limn→∞On(α

∗∗n ) = min

{limn→∞On(α

−0 ), lim

n→∞On(α+0 )}> 0.

This result and (7.62) show that almost surely

limn→∞ min

θ∈�, α ∈(α−0 ,α+0 )Qn(θ)> 0.

Since minθ Qn(θ) ≤ Qn(θ0) = 0, it follows that

limn→∞ arg min

θ∈�Qn(θ) ∈ (0,∞)× (α−0 , α

+0 ).

Since (α−0 , α+0 ) is an interval which contains α0 and can be arbitrarily small, we obtain

the result. �

To prove the asymptotic normality of the QMLE, we need the following intermediate result.

Page 186: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

170 GARCH MODELS

Lemma 7.2 Under the assumptions of Theorem 7.3, we have

∞∑t=1

supθ∈�

∣∣∣∣ ∂∂ω t (θ)∣∣∣∣ <∞ a.s., (7.63)

∞∑t=1

supθ∈�

∥∥∥∥ ∂2

∂ω∂θ t (θ)

∥∥∥∥ <∞ a.s., (7.64)

1

n

n∑t=1

supθ∈�

∣∣∣∣∣ ∂2

∂α2 t (ω, α0)− 1

α20

∣∣∣∣∣ = o(1) a.s., (7.65)

1

n

n∑t=1

supθ∈�

∣∣∣∣ ∂3

∂α3 t (θ)

∣∣∣∣ = O(1) a.s. (7.66)

Proof. Using Lemma 7.1, there exists a real random variable K and a constant ρ ∈ (0, 1) inde-pendent of θ and of t such that∣∣∣∣ ∂∂ω t (θ)

∣∣∣∣ =∣∣∣∣∣−(ω0 + α0ε

2t−1)η

2t

(ω + αε2t−1)

2+ 1

ω + αε2t−1

∣∣∣∣∣ ≤ Kρt(η2t + 1).

Since it has a finite expectation, the series∑∞

t=1 Kρt(η2

t + 1) is almost surely finite. This shows(7.63), and (7.64) follows similarly. We have

∂2 t (ω, α0)

∂α2− 1

α20

={

2(ω0 + α0ε

2t−1)η

2t

ω + α0ε2t−1

− 1

}ε4t−1

(ω + α0ε2t−1)

2− 1

α20

= (2η2t − 1

) ε4t−1

(ω + α0ε2t−1)

2− 1

α20

+ r1,t

= 2(η2t − 1

) 1

α20

+ r1,t + r2,t

where

supθ∈�

|r1,t | = supθ∈�

∣∣∣∣∣ 2(ω0 − ω)η2t

(ω + α0ε2t−1)

ε4t−1

(ω + α0ε2t−1)

2

∣∣∣∣∣ = o(1) a.s.

and

supθ∈�

|r2,t | = supθ∈�

∣∣∣∣∣(2η2t − 1)

{ε4t−1

(ω + α0ε2t−1)

2− 1

α20

}∣∣∣∣∣= sup

θ∈�

∣∣∣∣∣(2η2t − 1)

{ω2 + 2α0ε

2t−1

α20(ω + α0ε

2t−1)

2

}∣∣∣∣∣= o(1) a.s.

as t →∞. Thus (7.65) is shown. To show (7.66), it suffices to note that∣∣∣∣ ∂3

∂α3 t (θ)

∣∣∣∣ =∣∣∣∣∣∣{

2− 6(ω0 + α0ε

2t−1)η

2t

ω + αε2t−1

}(ε2t−1

ω + αε2t−1

)3∣∣∣∣∣∣

≤{

2+ 6(ω0

ω+ α0

α

)η2t

} 1

α3.

Page 187: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 171

Proof of (7.20). We remark that we do not know, a priori , if the derivative of the criterionis equal to zero at θn = (ωn, αn), because we only have the convergence of αn to α0. Thus theminimum of the criterion could lie at the boundary of �, even asymptotically. By contrast, thepartial derivative with respect to the second coordinate must asymptotically vanish at the optimum,

since αn → α0 and θ0 ∈◦�. A Taylor expansion of the derivative of the criterion thus gives(

1√n

∑nt=1

∂∂ω t (θn)

0

)= 1√

n

n∑t=1

∂θ t (θ0)+ Jn

√n(θn − θ0), (7.67)

where Jn is a 2× 2 matrix whose elements are of the form

Jn(i, j) = 1

n

n∑t=1

∂2

∂θj ∂θj t (θ

∗i,j )

with θ∗i,j between θn and θ0. By Lemma 7.1, which shows that ε2t →∞ almost surely, and by the

central limit theorem of Lindeberg for martingale increment (see Corollary A.1),

1√n

n∑t=1

∂α t (θ0) = 1√

n

n∑t=1

(1− η2t )

ε2t−1

ω0 + α0ε2t−1

= 1√n

n∑t=1

(1− η2t )

1

α0+ oP (1)

L→ N(

0,κη − 1

α20

). (7.68)

Relation (7.64) of Lemma 7.2 and the compactness of � show that

Jn(2, 1)√n(ωn − ω0)→ 0 a.s. (7.69)

By a Taylor expansion of the function

α �→ 1

n

n∑t=1

∂2

∂α2 t (ω

∗2,2, α),

we obtain

Jn(2, 2) = 1

n

n∑t=1

∂2

∂α2 t (ω

∗2,2, α0)+ 1

n

n∑t=1

∂3

∂α3 t (ω

∗2,2, α

∗)(α∗2,2 − α0),

where α∗ is between α∗2,2 and α0. Using (7.65), (7.66) and (7.19), we obtain

Jn(2, 2)→ 1

α20

a.s. (7.70)

We conclude using the second row of (7.67), and also using (7.68), (7.69) and (7.70). �

Page 188: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

172 GARCH MODELS

Proof of Theorem 7.4The proof follows the steps of the proof of Theorem 7.1. We will show the following points:

(a) limn→∞ supϕ∈� |ln(ϕ)− ln(ϕ)| = 0, a.s.

(b)(∃t ∈ Z such that εt (ϑ) = εt (ϑ0) and σ 2

t (ϕ) = σ 2t (ϕ0)Pϕ0 a.s.

) �⇒ ϕ = ϕ0.

(c) If ϕ = ϕ0, Eϕ0 t (ϕ)>Eϕ0 t (ϕ0).

(d) For any ϕ = ϕ0 there exists a neighborhood V (ϕ) such that

lim infn→∞ inf

ϕ∗∈V (ϕ)ln(ϕ∗)>Eϕ0 1(ϕ0), a.s.

(a) Nullity of the asymptotic impact of the initial values. Equations (7.10)–(7.28) remain validunder the convention that εt = εt (ϑ). Equation (7.29) must be replaced by

σ 2t = ct + Bct−1 + · · · + Bt−1c1 + Bt σ 2

0, (7.71)

where ct = (ω +∑q

i=1 αiε2t−i , 0, . . . , 0)′, the ‘tilde’ variables being initialized as indicated before.

Assumptions A7 and A8 imply that,

for any k ≥ 1 and 1 ≤ i ≤ q, supϕ∈�

|εk−i − εk−i | ≤ Kρk, a.s. (7.72)

It follows that almost surely

‖ck − ck‖ ≤q∑i=1

|αi ||ε2k−i − ε2

k−i |

≤q∑i=1

|αi ||εk−i − εk−i |(|2εk−i | + |εk−i − εk−i |)

≤ Kρk

(q∑i=1

|εk−i | + 1

)and thus, by (7.28), (7.71) and (7.27),

‖σ 2t − σ 2

t ‖ =∥∥∥∥∥

t∑k=0

Bt−k(ck − ck)+ Bt (σ 20 − σ 2

0)

∥∥∥∥∥≤ K

t∑k=0

ρt−kρk(

q∑i=1

|εk−i | + 1

)+Kρt

≤ Kρtt∑

k=−q(|εk | + 1) . (7.73)

Similarly, we have that almost surely |ε2t − ε2

t | ≤ Kρt (|εt | + 1). The difference between thetheoretical log-likelihoods with and without initial values can thus be bounded as follows:

supϕ∈�

|ln(ϕ)− ln(ϕ)| ≤ n−1n∑t=1

supϕ∈�

{∣∣∣∣ σ 2t − σ 2

t

σ 2t σ

2t

∣∣∣∣ ε2t +

∣∣∣∣log

(1+ σ 2

t − σ 2t

σ 2t

)∣∣∣∣+ |ε2t − ε2

t |σ 2t

}

≤{

supϕ∈�

max(1

ω2,

1

ω)

}Kn−1

n∑t=1

ρt (ε2t + 1)

t∑k=−q

(|εk | + 1) .

Page 189: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 173

This inequality is analogous to (7.31), ε2t + 1 being replaced by ξt = (ε2

t + 1)∑t

k=−q (|εk| + 1).Following the lines of the proof of (a) in Theorem 7.1 (see Exercise 7.2), it suffices to show thatfor all real r > 0, E(ρtξt )r is the general term of a finite series. Note that7

E(ρt ξt )s/2 ≤ ρts/2

t∑k=−q

E(ε2t |εk | + ε2

t + |εk | + 1)s/2

≤ ρts/2t∑

k=−q[{E(ε2s

t )E|εk|s}1/2 + E|εt |s + E|εk|s/2 + 1]

= O(tρts/2),

since, by Corollary 2.3, E(ε2st

)<∞. Statement (a) follows.

(b) Identifiability of the parameter. If εt (ϑ) = εt (ϑ0) almost surely, assumptions A8 and A9imply that there exists a constant linear combination of the variables Xt−j , j ≥ 0. The linearinnovation of (Xt ), equal to Xt − E(Xt |Xu, u < t) = ηtσt (ϕ0), is zero almost surely only if ηt = 0a.s. (since σ 2

t (ϕ0) ≥ ω0 > 0). This is precluded, since E(η2t ) = 1. It follows that ϑ = ϑ0, and thus

that θ = θ0 by the argument used in the proof of Theorem 7.1.

(c) The limit criterion is minimized at the true value. By the arguments used in the proofof (c) in Theorem 7.1, it can ne shown that, for all ϕ, Eϕ0 ln(ϕ) = Eϕ0 t (ϕ) is defined in R ∪ {+∞},and in R at ϕ = ϕ0. We have

Eϕ0 t (ϕ)− Eϕ0 t (ϕ0) = Eϕ0 logσ 2t (ϕ)

σ 2t (ϕ0)

+ Eϕ0

[ε2t (ϑ)

σ 2t (ϕ)

− ε2t (ϑ0)

σ 2t (ϕ0)

]= Eϕ0

{log

σ 2t (ϕ)

σ 2t (ϕ0)

+ σ 2t (ϕ0)

σ 2t (ϕ)

− 1

}+Eϕ0

{εt (ϑ)− εt (ϑ0)}2σ 2t (ϕ)

+Eϕ0

2ηtσt (ϕ0){εt (ϑ)− εt (ϑ0)}σ 2t (ϕ)

≥ 0

because the last expectation is equal to 0 (noting that εt (ϑ)− εt (ϑ0) belongs to the past, as wellas σt (ϕ0) and σt (ϕ)), the other expectations being positive or null by arguments already used. Thisinequality is strict only if εt (ϑ) = εt (ϑ0) and if σ 2

t (ϕ) = σ 2t (ϕ0) Pϕ0 a.s. which, by (b), implies

ϕ = ϕ0 and completes the proof of (c).

(d) Use of the compactness of and of the ergodicity of (�t (ϕ)). The end of the proof is thesame as that of Theorem 7.1.

Proof of Theorem 7.5The proof follows the steps of that of Theorem 7.2. The block-diagonal form of the matricesI and J when the distribution of ηt is symmetric is shown in Exercise 7.7. It suffices to establishthe following properties.

7 We use the fact that if X and Y are positive random variables, E(X + Y)r ≤ E(X)r + E(Y)r for allr ∈ (0, 1], this inequality being trivially obtained from the inequality already used: (a + b)r ≤ ar + br for allpositive real numbers a and b.

Page 190: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

174 GARCH MODELS

(a) Eϕ0

∥∥∥ ∂ t (ϕ0)∂ϕ

∂ t (ϕ0)∂ϕ′

∥∥∥ <∞, Eϕ0

∥∥∥ ∂2 t (ϕ0)∂ϕ∂ϕ′

∥∥∥ <∞.

(b) I and J are invertible.

(c)∥∥∥n−1/2∑n

t=1

{∂ t (ϕ0)∂ϕ

− ∂ t (ϕ0)∂ϕ

}∥∥∥ and supϕ∈V(ϕ0)

∥∥∥n−1 ∑nt=1

{∂2 t (ϕ)∂ϕ∂ϕ′ − ∂2 t (ϕ)

∂ϕ∂ϕ′}∥∥∥ tend in

probability to 0 as n→∞.

(d) n−1/2∑nt=1

∂ t∂ϕ(ϕ0)⇒ N (0, I).

(e) n−1∑nt=1

∂2 t∂ϕi∂ϕj

(ϕ∗)→ J(i, j) a.s., for all ϕ∗ between ϕn and ϕ0.

Formulas (7.38) and (7.39) giving the derivatives with respect to the GARCH parameters (thatis, the vector θ ) remain valid in the presence of an ARMA part (writing ε2

t = ε2t (ϑ)). The same is

true for all the results established in (a) and (b) of the proof of Theorem 7.2, with obvious changesof notation. The derivatives of t (ϕ) = ε2

t (ϑ)/σ2t (ϕ)+ log σ 2

t (ϕ) with respect to the parameter ϑ ,and the cross derivatives with respect to θ and ϑ , are given by

∂ t (ϕ)

∂ϑ=(

1− ε2t

σ 2t

)1

σ 2t

∂σ 2t

∂ϑ+ 2εt

σ 2t

∂εt

∂ϑ, (7.74)

∂2 t (ϕ)

∂ϑ∂ϑ ′=(

1− ε2t

σ 2t

)1

σ 2t

∂2σ 2t

∂ϑ∂ϑ ′+(

2ε2t

σ 2t

− 1

)1

σ 2t

∂σ 2t

∂ϑ

1

σ 2t

∂σ 2t

∂ϑ ′

+ 2

σ 2t

∂εt

∂ϑ

∂εt

∂ϑ ′+ 2εt

σ 2t

∂2εt

∂ϑ∂ϑ ′− 2εt

σ 2t

(∂εt

∂ϑ

1

σ 2t

∂σ 2t

∂ϑ ′+ 1

σ 2t

∂σ 2t

∂ϑ

∂εt

∂ϑ ′

), (7.75)

∂2 t (ϕ)

∂ϑ∂θ ′=(

1− ε2t

σ 2t

)1

σ 2t

∂2σ 2t

∂ϑ∂θ ′+(

2ε2t

σ 2t

− 1

)1

σ 2t

∂σ 2t

∂ϑ

1

σ 2t

∂σ 2t

∂θ ′− 2εt

σ 2t

∂εt

∂ϑ

1

σ 2t

∂σ 2t

∂ϑ ′.

(7.76)

The derivatives of εt are of the form

∂εt

∂ϑ= (−Aϑ(1)B

−1ϑ (1), vt−1(ϑ), . . . , vt−P (ϑ), ut−1(ϑ), . . . , ut−Q(ϑ))′

where

vt (ϑ) = −A−1ϑ (B)εt (ϑ), ut (ϑ) = B−1

ϑ (B)εt (ϑ) (7.77)

and

∂2εt

∂ϑ∂ϑ ′=

0P+1,P+1

01,Q

−A−1ϑ (B)B−1

ϑ (B)HP,Q(t)

0Q,1 −A−1ϑ (B)B−1

ϑ (B)HQ,P (t) −2B−2ϑ (B)HQ,Q(t)

,

(7.78)

where Hk, (t) is the k × (Hankel) matrix of general term εt−i−j , and 0k, denotes the null matrixof size k × . Moreover, by (7.28),

∂σ 2t

∂ϑj=

∞∑k=0

Bk(1, 1)q∑i=1

2αiεt−k−i∂εt−k−i∂ϑj

, (7.79)

Page 191: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 175

where ϑj denotes the j th component of ϑ , and

∂2σ 2t

∂ϑj ∂ϑ =

∞∑k=0

Bk(1, 1)q∑i=1

2αi

(∂εt−k−i∂ϑj

∂εt−k−i∂ϑ

+ εt−k−i∂2εt−k−i∂ϑj ∂ϑ

). (7.80)

(a) Integrability of the derivatives of the criterion at φ0. The existence of the expectations in(7.40) remains true. By (7.74)–(7.76), the independence between (εt /σt )(ϕ0) = ηt and σ 2

t (ϕ0), itsderivatives, and the derivatives of εt (ϑ0), using E(η4

t ) <∞ and σ 2t (ϕ0)>ω0 > 0, it suffices to

show that

Eϕ0

∥∥∥∥∂εt∂ϑ

∂εt

∂ϑ ′(θ0)

∥∥∥∥ <∞, Eϕ0

∥∥∥∥ ∂2εt

∂ϑ∂ϑ ′(θ0)

∥∥∥∥ <∞, (7.81)

Eϕ0

∥∥∥∥ 1

σ 4t

∂σ 2t

∂ϑ

∂σ 2t

∂ϑ ′(ϕ0)

∥∥∥∥ <∞, Eϕ0

∥∥∥∥ ∂2σ 2t

∂ϑ∂ϑ ′(ϕ0)

∥∥∥∥ <∞, Eϕ0

∥∥∥∥ ∂2σ 2t

∂θ∂ϑ ′(ϕ0)

∥∥∥∥ <∞ (7.82)

to establish point (a), together with the existence of the matrices I and J. By the expressions forthe derivatives of εt , (7.77)-(7.78), and using Eε2

t (ϑ0) <∞, we obtain (7.81).The Cauchy–Schwarz inequality implies that∣∣∣∣∣q∑i=1

α0iεt−k−i (ϑ0)∂εt−k−i (ϑ0)

∂ϑj

∣∣∣∣∣ ≤{

q∑i=1

α0iε2t−k−i (ϑ0)

}1/2 { q∑i=1

α0i

(∂εt−k−i (ϑ0)

∂ϑj

)2}1/2

.

Thus, in view of (7.79) and the positivity of ω0,

∂σ 2t

∂ϑj(ϕ0) ≤ 2

∞∑k=0

Bk0 (1, 1)

{ω0 +

q∑i=1

α0iε2t−k−i (ϑ0)

}1/2 { q∑i=1

α0i

(∂εt−k−i (ϑ0)

∂ϑj

)2}1/2

.

Using the triangle inequality and the elementary inequalities (∑ |xi |)1/2 ≤∑ |xi |1/2 and x/(1+

x2) ≤ 1, it follows that

∥∥∥∥ 1

σ 2t

∂σ 2t

∂ϑj(ϕ0)

∥∥∥∥2

≤∥∥∥∥∥∥2

∞∑k=0

Bk/2(1, 1)c1/2t−k(1)B

k/2(1, 1)∑q

i=1 α1/2i

∣∣∣ ∂εt−k−i∂ϑj

∣∣∣ω + Bk(1, 1)ct−k(1)

(ϕ0)

∥∥∥∥∥∥2

∥∥∥∥∥∥∥∥∥2√ω

∞∑k=0

Bk/2(1, 1)

Bk/2(1,1)c1/2t−k (1)√

ω

1+(Bk/2(1,1)c1/2

t−k(1)√ω

)2

q∑i=1

α1/2i

∣∣∣∣∂εt−k−i∂ϑj

∣∣∣∣ (ϕ0)

∥∥∥∥∥∥∥∥∥2

≤ K√ω0

∞∑k=0

ρk/2q∑i=1

α1/20i

∥∥∥∥∂εt−k−i (ϑ0)

∂ϑj

∥∥∥∥2

<∞. (7.83)

The first inequality of (7.82) follows. The existence of the second expectation in (7.82) is aconsequence of (7.80), the Cauchy–Schwarz inequality, and the square integrability of εt and itsderivatives. To handle the second-order partial derivatives of σ 2

t , first note that (∂2σ 2t /∂ϑ∂ω)(ϕ0) =

0 by (7.41). Moreover, using (7.79),

Eϕ0

∣∣∣∣ ∂2σ 2t

∂ϑj ∂αi(ϕ0)

∣∣∣∣ = Eϕ0

∣∣∣∣∣2∞∑k=0

Bk(1, 1)εt−k−i∂εt−k−i∂ϑj

(ϕ0)

∣∣∣∣∣ <∞. (7.84)

Page 192: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

176 GARCH MODELS

By the arguments used to show (7.44), we obtain

β0 Eϕ0

∣∣∣∣ ∂2σ 2t

∂ϑj∂β (ϕ0)

∣∣∣∣ ≤ Eϕ0

∣∣∣∣∣∞∑k=1

kBk(1, 1)q∑i=1

2α0iεt−k−i∂εt−k−i∂ϑj

(ϕ0)

∣∣∣∣∣ <∞, (7.85)

which entails the existence of the third expectation in (7.82).

(b) Invertibility of I and J . Assume that I is noninvertible. There exists a nonzero vector λ inRP+Q+p+q+2 such that λ′∂ t (ϕ0)/∂ϕ

′ = 0 a.s. By (7.38) and (7.74), this implies that

(1− η2

t

) 1

σ 2t (ϕ0)

λ′∂σ 2

t (ϕ0)

∂ϕ+ 2ηtσt (ϕ0)

λ′∂εt (ϑ0)

∂ϕ= 0 a.s. (7.86)

Taking the variance of the left-hand side, conditionally on the σ -field generated by {ηu, u < t},we obtain a.s., at ϕ = ϕ0,

0 = (κη − 1)

(1

σ 2t

λ′∂σ 2

t

∂ϕ

)2

− 2νη1

σ 2t

λ′∂σ 2

t

∂ϕ

2

σtλ′∂εt

∂ϕ+(

2

σtλ′∂εt

∂ϕ

)2

:= (κη − 1)a2t − 2νηatbt + b2

t

= (κη − 1− ν2η)a

2t + (bt − νηat )

2,

where νη = E(η3t ). It follows that κη − 1− ν2

η ≤ 0 and bt = at {νη ± (ν2η + 1− κη)

1/2} a.s.By stationarity, we have either bt = at {νη − (ν2

η + 1− κη)1/2} a.s. for all t , or bt = at {νη + (ν2

η +1− κη)

1/2} a.s. for all t . Consider for instance the latter case, the first one being treated similarly.Relation (7.86) implies at [1− η2

t + {νη + (ν2η + 1− κη)

1/2}ηt ] = 0 a.s. The term in bracketscannot vanish almost surely, otherwise ηt would take at least two different values, which wouldbe in contradiction to assumption A12. It follows that at = 0 a.s. and thus bt = 0 a.s. We haveshown that almost surely

λ′∂εt (ϕ0)

∂ϕ= λ′1

∂εt (ϑ0)

∂ϑ= 0 and λ′

∂σ 2t (ϕ0)

∂ϕ= 0, (7.87)

where λ1 is the vector of the first P +Q+ 1 components of λ. By stationarity of (∂εt /∂ϕ)t , thefirst equality implies that

0 = λ′1

−Aϑ0(1)c0 −Xt−1

...

c0 −Xt−Pεt−1...

εt−Q

+

Q∑j=1

b0jλ′1∂εt−j (ϑ0)

∂ϑ= λ′1

−Aϑ0(1)c0 −Xt−1

...

c0 −Xt−Pεt−1...

εt−Q

.

We now use assumption A9, that the ARMA representation is minimal, to conclude that λ1 = 0.

The third equality in (7.87) is then written, with obvious notation, as λ′2∂σ2

t (ϕ0)

∂θ= 0. We have

already shown in the proof of Theorem 7.2 that this entails λ2 = 0. We are led to a contradiction,which proves that I is invertible. Using (7.39) and (7.75)-(7.76), we obtain

J = Eϕ0

(1

σ 4t

∂σ 2t

∂ϕ

∂σ 2t

∂ϕ′(ϕ0)

)+ 2Eϕ0

(1

σ 2t

∂εt

∂ϕ

∂εt

∂ϕ′(ϕ0)

).

Page 193: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 177

We have just shown that the first expectation is a positive definite matrix. The second expectationbeing a positive semi-definite matrix, J is positive definite and thus invertible, which completesthe proof of (b).

(c) Asymptotic unimportance of the initial values. The initial values being fixed, the derivativesof σ 2

t , obtained from (7.71), are given by

∂σ 2t

∂ω=

t−1∑k=0

Bk1,∂σ 2

t

∂αi=

t−1∑k=0

Bkε2t−k−i ,

∂σ 2t

∂βj=

t−1∑k=1

{k∑i=1

Bi−1B(j)Bk−i}ct−k,

with the notation introduced in (7.41)–(7.42) and (7.55)–(7.56). As for (7.79), we obtain

∂σ 2t

∂ϑj=

t−1∑k=0

Bk(1, 1)q∑i=1

2αiεt−k−i∂εt−k−i∂ϑj

and, by an obvious extension of (7.72),

supϕ∈�

max

{|εk − εk |,

∣∣∣∣ ∂εk∂ϑj− ∂εk

∂ϑj

∣∣∣∣} ≤ Kρk, a.s. (7.88)

Thus ∣∣∣∣∂σ 2t

∂ϑj− ∂σ 2

t

∂ϑj

∣∣∣∣≤∣∣∣∣∣∞∑k=t

Bk(1, 1)q∑i=1

2αiεt−k−i∂εt−k−i∂ϑj

∣∣∣∣∣+

t−1∑k=0

Bk(1, 1)q∑i=1

2αi

∣∣∣∣(εt−k−i − εt−k−i )∂εt−k−i∂ϑj

+ εt−k−i(∂εt−k−i∂ϑj

− ∂εt−k−i∂ϑj

)∣∣∣∣≤ Kρt

∞∑k=0

ρk∣∣∣∣ε−k−1

∂ε−k−1

∂ϑj

∣∣∣∣+K

t−1∑k=0

ρkq∑i=1

ρt−k−i{∣∣∣∣∂εt−k−i∂ϑj

∣∣∣∣+ |εt−k−i |}

≤ Kρt∞∑k=0

ρk∣∣∣∣ε−k−1

∂ε−k−1

∂ϑj

∣∣∣∣+Kρt/2t−1+q∑k=1

ρk/2ρt−k

2

{∣∣∣∣∂εt−k∂ϑj

∣∣∣∣+ |εt−k|}

≤ Kρt/2∞∑k=0

ρk/2{∣∣∣∣ε−k−1

∂ε−k−1

∂ϑj

∣∣∣∣+ ∣∣∣∣∂εk+1−q∂ϑj

∣∣∣∣+ ∣∣εk+1−q∣∣} .

The latter sum converges almost surely because its expectation is finite. We have thus shown that

supϕ∈�

∣∣∣∣∂σ 2t

∂ϑj− ∂σ 2

t

∂ϑj

∣∣∣∣ ≤ Kρt a.s.

The other derivatives of σ 2t are handled similarly, and we obtain

supϕ∈�

∥∥∥∥∂σ 2t

∂ϕ− ∂σ 2

t

∂ϕ

∥∥∥∥ < Kρt a.s.

Page 194: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

178 GARCH MODELS

We have, in view of (7.73),∣∣∣∣ 1

σ 2t

− 1

σ 2t

∣∣∣∣ = ∣∣∣∣ σ 2t − σ 2

t

σ 2t σ

2t

∣∣∣∣ ≤ K

σ 2t

ρtSt−1,σ 2t

σ 2t

≤ 1+KρtSt−1,

where St−1 =∑t−1

k=1−q (|εk| + 1) . It is also easy to check that for ϕ = ϕ0,

∣∣ε2t − ε2

t

∣∣ ≤ Kρt(1+ σtηt ),

∣∣∣∣1− ε2t

σ 2t

∣∣∣∣ ≤ 1+ η2t +Kρt (1+ |ηt |St−1 + η2

t St−1).

It follows that, using (7.88),∣∣∣∣∂ t (ϕ0)

∂ϕi− ∂ t (ϕ0)

∂ϕi

∣∣∣∣ = ∣∣∣∣{ε2t − ε2

t

} { 1

σ 2t

}{1

σ 2t

∂σ 2t

∂ϕi

}+ ε2

t

{1

σ 2t

− 1

σ 2t

}{1

σ 2t

∂σ 2t

∂ϕi

}+{

1− ε2t

σ 2t

}{1

σ 2t

− 1

σ 2t

}{∂σ 2

t

∂ϕi

}+{

1− ε2t

σ 2t

}{1

σ 2t

}{∂σ 2

t

∂ϕi− ∂σ 2

t

∂ϕi

}+ 2 {εt − εt }

{1

σ 2t

}{∂εt

∂ϕi

}+ 2εt

{1

σ 2t

− 1

σ 2t

}{∂εt

∂ϕi

}+ 2εt

{1

σ 2t

}{∂εt

∂ϕi− ∂εt

∂ϕi

}∣∣∣∣ (ϕ0)

≤ Kρt{1+ S2

t−1(|ηt | + η2t )} {

1+ 1

σ 2t

∂σ 2t

∂ϕi+∣∣∣∣ ∂εt∂ϕi

∣∣∣∣} (ϕ0).

Using the independence between ηt and St−1, (7.40), (7.83), the Cauchy–Schwarz inequality andE(ε4

t ) <∞, we obtain

P

(∣∣∣∣∣n−1/2n∑t=1

{∂ t (ϕ0)

∂ϕi− ∂ t (ϕ0)

∂ϕi

}∣∣∣∣∣>ε

)

≤ K

ε

∥∥∥∥1+ 1

σ 2t

∂σ 2t

∂ϕi(ϕ0)+

∣∣∣∣ ∂εt∂ϕi

∣∣∣∣ (ϕ0)

∥∥∥∥2

n−1/2n∑t=1

ρt{1+ (‖η2

t ‖2 + ‖ηt‖2)‖S2t−1‖2

}≤ K

ε

∥∥∥∥1+ 1

σ 2t

∂σ 2t

∂ϕi(ϕ0)+

∣∣∣∣ ∂εt∂ϕi

∣∣∣∣ (ϕ0)

∥∥∥∥2

n−1/2n∑t=1

ρt t2 → 0,

which shows the first part of (c). The second is established by the same arguments.

(d) Use of a CLT for martingale increments. The proof of this point is exactly the same as thatof the pure GARCH case (see the proof of Theorem 7.2).

(e) Convergence to the matrix J . This part of the proof differs drastically from that ofTheorem 7.2. For pure GARCH, we used a Taylor expansion of the second-order derivativesof the criterion, and showed that the third-order derivatives were uniformly integrable in aneighborhood of θ0. Without additional assumptions, this argument fails in the ARMA-GARCHcase because variables of the form σ−2

t (∂σ 2t /∂ϑ) do not necessarily have moments of all orders,

even at the true value of the parameter. First note that, since J exists, the ergodic theoremimplies that

limn→∞

1

n

n∑t=1

∂2 t (ϕ0)

∂ϕ∂ϕ′= J a.s.

Page 195: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 179

The consistency of ϕn having already been established, it suffices to show that for all ε > 0, thereexists a neighborhood V(ϕ0) of ϕ0 such that almost surely

limn→∞

1

n

n∑t=1

supϕ∈V(ϕ0)

∥∥∥∥∂2 t (ϕ)

∂ϕ∂ϕ′− ∂2 t (ϕ0)

∂ϕ∂ϕ′

∥∥∥∥ ≤ ε (7.89)

(see Exercise 7.9). We first show that there exists V(ϕ0) such that

E supϕ∈V(ϕ0)

∥∥∥∥∂2 t (ϕ)

∂ϕ∂ϕ′

∥∥∥∥ <∞. (7.90)

By Holder’s inequality, (7.39), (7.75) and (7.76), it suffices to show that for any neighborhoodV(ϕ0) ⊂ � whose elements have their components αi and βj bounded above by a positive constant,the quantities ∥∥∥∥∥ sup

ϕ∈V(ϕ0)

ε2t

∥∥∥∥∥2

,

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣∂εt∂ϑ

∣∣∣∣∥∥∥∥∥

4

,

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣ ∂2εt

∂ϑ∂ϑ ′

∣∣∣∣∥∥∥∥∥

4

, (7.91)

∥∥∥∥∥ supϕ∈V(ϕ0)

1

σ 2t

∥∥∥∥∥∞,

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣ 1

σ 2t

∂σ 2t

∂ϑ

∣∣∣∣∥∥∥∥∥

4

,

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣ 1

σ 2t

∂σ 2t

∂θ

∣∣∣∣∥∥∥∥∥

4

, (7.92)

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣ ∂2σ 2t

∂ϑ∂ϑ ′

∣∣∣∣∥∥∥∥∥

2

,

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣ ∂2σ 2t

∂θ∂ϑ ′

∣∣∣∣∥∥∥∥∥

2

,

∥∥∥∥∥ supϕ∈V(ϕ0)

∣∣∣∣ 1

σ 2t

∂2σ 2t

∂θ∂θ ′

∣∣∣∣∥∥∥∥∥

2

(7.93)

are finite. Using the expansion of the series

εt (ϑ) = Aϑ(B)B−1ϑ (B)A−1

ϑ0(B)Bϑ0 (B)εt (ϑ0),

similar expansions for the derivatives, and ‖εt (ϑ0)‖4 <∞, it can be seen that the norms in (7.91)are finite. In (7.92) the first norm is finite, as an obvious consequence of σ 2

t ≥ infφ∈� ω, this latterterm being strictly positive by compactness of �. An extension of inequality (7.83) leads to

supϕ∈�

∥∥∥∥ 1

σ 2t

∂σ 2t

∂ϑj(ϕ)

∥∥∥∥4

≤ K

∞∑k=0

ρk/2 supϕ∈�

∥∥∥∥∂εt−k∂ϑj

∥∥∥∥4

<∞.

Moreover, since (7.41)–(7.44) remain valid when εt is replaced by εt (ϑ), it can be shown that

supϕ∈V(ϕ0)

∥∥∥∥ 1

σ 2t

∂σ 2t

∂θ(ϕ)

∥∥∥∥d

<∞

for any d > 0 and any neighborhood V(ϕ0) whose elements have their components αi and βjbounded from below by a positive constant. The norms in (7.92) are thus finite. The existence ofthe first norm of (7.93) follows from (7.80) and (7.91). To handle the second one, we use (7.84),(7.85), (7.91), and the fact that supϕ∈V(ϕ0)

β−1j <∞. Finally, it can be shown that the third norm

is finite by (7.47), (7.48) and by arguments already used. The property (7.90) is thus established.The ergodic theorem shows that the limit in (7.89) is equal almost surely to

E supϕ∈V(ϕ0)

∥∥∥∥∂2 t (ϕ)

∂ϕ∂ϕ′− ∂2 t (ϕ0)

∂ϕ∂ϕ′

∥∥∥∥ .

Page 196: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

180 GARCH MODELS

By the dominated convergence theorem, using (7.90), this expectation tends to 0 when theneighborhood V(ϕ0) tends to the singleton {ϕ0}. Thus (7.89) hold true, which proves (e). Theproof of Theorem 7.5 is now complete.

7.5 Bibliographical Notes

The asymptotic properties of the QMLE of the ARCH models have been established by Weiss(1986) under the condition that the moment of order 4 exists. In the GARCH(1, 1) case, theasymptotic properties have been established by Lumsdaine (1996) (see also Lee and Hansen, 1994)for the local QMLE under the strict stationarity assumption. In Lumsdaine (1996) the conditionson the coefficients α1 and β1 allow to handle the IGARCH(1, 1) model. They are, however, veryrestrictive with regard to the iid process: it is assumed that E|ηt |32 <∞ and that the density ofηt has a unique mode and is bounded in a neighborhood of 0. In Lee and Hansen (1994) theconsistency of the global estimator is obtained under the assumption of second-order stationarity.

Berkes, Horvath and Kokoszka (2003b) was the first paper to give a rigorous proof of theasymptotic properties of the QMLE in the GARCH(p, q) case under very weak assumptions; seealso Berkes and Horvath (2003b, 2004), together with Boussama (1998, 2000). The assumptionsgiven in Berkes, Horvath and Kokoszka (2003b) were weakened slightly in Francq and Zakoıan(2004). The proofs presented here come from that paper. An extension to non-iid errors wasrecently proposed by Escanciano (2009).

Jensen and Rahbek (2004a, 2004b) have shown that the parameter α0 of an ARCH(1) model, orthe parameters α0 and β0 of a GARCH(1, 1) model, can be consistently estimated, with a standardGaussian asymptotic distribution and a standard rate of convergence, even if the parameters areoutside the strict stationarity region. They considered a constrained version of the QMLE, inwhich the intercept ω is fixed (see Exercises 7.13 and 7.14). These results were misunderstood bya number of researchers and practitioners, who wrongly claimed that the QMLE of the GARCHparameters is consistent and asymptotically normal without any stationarity constraint. We haveseen in Section 7.1.3 that the QMLE of ω0 is inconsistent in the nonstationary case.

For ARMA-GARCH models, asymptotic results have been established by Ling and Li (1997,1998), Ling and McAleer (2003a, 2003b) and Francq and Zakoıan (2004). A comparison of theassumptions used in these papers can be found in the last reference. We refer the reader to Strau-mann (2005) for a detailed monograph on the estimation of GARCH models, to Francq and Zakoıan(2009a) for a recent review of the literature, and to Straumann and Mikosch (2006) and Bardet andWintenberger (2009) for extensions to other conditionally heteroscedastic models. Li, Ling andMcAleer (2002) reviewed the literature on the estimation of ARMA-GARCH models, includingin particular the case of nonstationary models.

The proof of the asymptotic normality of the QMLE of ARMA models under the second-ordermoment assumption can be found, for instance, in Brockwell and Davis (1991). For ARMA modelswith infinite variance noise, see Davis, Knight and Liu (1992), Mikosch, Gadrich, Kluppelberg andAdler (1995) and Kokoszka and Taqqu (1996).

7.6 Exercises

7.1 (The distribution of ηt is symmetric for GARCH models)The aim of this exercise is to show property (7.24).

1. Show the result for j < 0.

2. For j ≥ 0, explain why E{g(ε2

t , ε2t−1, . . . )|εt−j , εt−j−1, . . . )

}can be written as

h(ε2t−j , ε

2t−j−1, . . . ) for some function h.

3. Complete the proof of (7.24).

Page 197: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 181

7.2 (Almost sure convergence to zero at an exponential rate)Let (εt ) be a strictly stationary process admitting a moment order s > 0. Show that ifρ ∈ (0, 1), then ρtε2

t → 0 a.s.

7.3 (Ergodic theorem for nonintegrable processes)Prove the following ergodic theorem. If (Xt ) is an ergodic and strictly stationary process andif EX1 exists in R ∪ {+∞}, then

n−1n∑t=1

Xt → EX1 a.s., as n→∞.

The result is shown in Billingsley (1995, p. 284) for iid variables.Hint : Consider the truncated variables Xκ

t = Xt 1lXt≤κ where κ > 0 with κ tending to +∞.

7.4 (Uniform ergodic theorem)Let {Xt(θ)} be a process of the form

Xt(θ) = f (θ, ηt , ηt−1, . . .), (7.94)

where (ηt ) is strictly stationary and ergodic and f is continuous in θ ∈ �, � being a compactsubset of Rd .

1. Show that the process {infθ∈� Xt (θ)} is strictly stationary and ergodic.

2. Does the property still hold true if Xt(θ) is not of the form (7.94) but it is assumed that{Xt(θ)} is strictly stationary and ergodic and that Xt(θ) is a continuous function of θ?

7.5 (OLS estimator of a GARCH)In the framework of the GARCH(p, q) model (7.1), an OLS estimator of θ is defined as anymeasurable solution θn of

θn = arg minθ∈�

Qn(θ), � ⊂ Rp+q+1,

where

Qn(θ) = n−1n∑t=1

e2t (θ), et (θ) = ε2

t − σ 2t (θ),

and σ 2t (θ) is defined by (7.4) with, for instance, initial values given by (7.6) or (7.7). Note

that the estimator is unconstrained and that the variable σ 2t (θ) can take negative values.

Similarly, a constrained OLS estimator is defined by

θ cn = arg minθ∈�c

Qn(θ), �c ⊂ (0,+∞)× [0,+∞)p+q .

The aim of this exercise is to show that under the assumptions of Theorem 7.1, and ifEθ0ε

4t <∞, the constrained and unconstrained OLS estimators are strongly consistent. We

consider the theoretical criterion

Qn(θ) = n−1n∑t=1

e2t (θ), et (θ) = ε2

t − σ 2t (θ).

Page 198: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

182 GARCH MODELS

1. Show that supθ∈�∣∣Qn(θ)−Qn(θ)

∣∣→ 0 almost surely as n→∞.

2. Show that the asymptotic criterion is minimized at θ0,

∀θ ∈ �, limn→∞Q(θ) ≥ lim

n→∞Q(θ0),

and that θ0 is the unique minimum.

3. Prove that θn → θ0 almost surely as n→∞.

4. Show that θ cn → θ0 almost surely as n→∞.

7.6 (The mean of the squares of the normalized residuals is equal to 1)For a GARCH model, estimated by QML with initial values set to zero, the normalizedresiduals are defined by ηt = εt /σt (θn), t = 1, . . . , n. Show that almost surely

1

n

n∑t=1

η2t = 1.

Hint : Note that for all c > 0, there exists θ∗n such that σ 2t (θ

∗n ) = cσ 2

t (θn) for all t ≥ 0, andconsider the function c �→ ln(θ∗n ).

7.7 (I and J block-diagonal)Show that I and J have the block-diagonal form given in Theorem 7.5 when the distributionof ηt is symmetric.

7.8 (Forms of I and J in the AR(1)-ARCH(1) case)We consider the QML estimation of the AR(1)-ARCH(1) model

Xt = a0Xt−1 + εt , εt = σtηt , σ 2t = 1+ α0ε

2t−1,

assuming that ω0 = 1 is known and without specifying the distribution of ηt .

1. Give the explicit form of the matrices I and J in Theorem 7.5 (with an obvious adaptationof the notation because the parameter here is (a0, α0)).

2. Give the block-diagonal form of these matrices when the distribution of ηt is symmetric,and verify that the asymptotic variance of the estimator of the ARCH parameter

(i) doe not depend on the AR parameter, and

(ii) is the same as for the estimator of a pure ARCH (without the AR part).

3. Compute when α0 = 0. Is the asymptotic variance of the estimator of a0 the same asthat obtained when estimating an AR(1)? Verify the results obtained by simulation in thecorresponding column of Table 7.3.

7.9 (A useful result in showing asymptotic normality)Let (Jt (θ)) be a sequence of random matrices, which are function of a vector of parameters θ.We consider an estimator θn which strongly converges to the vector θ0. Assume that

1

n

n∑t=1

Jt (θ0)→ J, a.s.,

Page 199: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

GARCH QMLE 183

where J is a matrix. Show that if for all ε > 0 there exists a neighborhood V (θ0) of θ0

such that

limn→∞

1

n

n∑t=1

supθ∈V (θ0)

‖Jt (θ)− Jt (θ0)‖ ≤ ε, a.s., (7.95)

where ‖ · ‖ denotes a matrix norm, then

1

n

n∑t=1

Jt (θn)→ J, a.s.

Give an example showing that condition (7.95) is not necessary for the latter convergence tohold in probability.

7.10 (A lower bound for the asymptotic variance of the QMLE of an ARCH)Show that, for the ARCH(q) model, under the assumptions of Theorem 7.2,

Varas{√n(θn − θ0)} ≥ (κη − 1)θ0θ

′0,

in the sense that the difference is a positive semi-definite matrix.Hint : Compute θ0∂σ

2t (θ0)/∂θ

′ and show that J − J θ0θ′0J is a variance matrix.

7.11 (A striking property of J )For a GARCH(p, q) model we have, under the assumptions of Theorem 7.2,

J = E(ZtZ′t ), where Zt = 1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ.

The objective of the exercise is to show that

�′J−1� = 1, where � = E(Zt ). (7.96)

1. Show the property in the ARCH case.Hint : Compute θ ′0Zt , θ ′0J and θ ′0J θ0.

2. In the GARCH case, let θ = (ω, α1, . . . , αq, 0, . . . , 0)′. Show that

θ′ ∂σ 2

t (θ)

∂θ ′= σ 2

t (θ).

3. Complete the proof of (7.96).

7.12 (A condition required for the generalized Bartlett formula)Using (7.24), show that if the distribution of ηt is symmetric and if E(ε4

t ) <∞, then formula(B.13) holds true, that is,

Eεt1εt2εt3εt4 = 0 when t1 = t2, t1 = t3 and t1 = t4.

7.13 (Constrained QMLE of the parameter α0 of a nonstationary ARCH(1) process)Jensen and Rahbek (2004a) consider the ARCH(1) model (7.15), in which the parameterω0 > 0 is assumed to be known (ω0 = 1 for instance) and where only α0 is unknown. Theywork with the constrained QMLE of α0 defined by

αcn(ω0) = arg minα∈[0,∞)

1

n

n∑t=1

t (α), t (α) = ε2t

σ 2t (α)

+ log σ 2t (α), (7.97)

Page 200: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

184 GARCH MODELS

where σ 2t (α) = ω0 + αε2

t−1. Assume therefore that ω0 = 1 and suppose that the nonstation-arity condition (7.16) id satisfied.

1. Verify that1√n

n∑t=1

∂α t (α0) = 1√

n

n∑t=1

(1− η2t )

ε2t−1

1+ α0ε2t−1

and thatε2t−1

1+ α0ε2t−1

→ 1

α0a.s. as t →∞.

2. Prove that1√n

n∑t=1

∂α t (α0)

L→ N(

0,κη − 1

α20

).

3. Determine the almost sure limit of

1

n

n∑t=1

∂2

∂α2 t (α0).

4. Show that for all α > 0, almost surely

supα≥α

∣∣∣∣∣ 1nn∑t=1

∂3

∂α3 t (α)

∣∣∣∣∣ = O(1).

5. Prove that if αcn = αcn(ω0)→ α0 almost surely (see Exercise 7.14) then

√n(αcn − α0

) L→ N {0, (κη − 1)α20

}.

6. Does the result change when αcn = αcn(1) and ω0 = 1?

7. Discuss the practical usefulness of this result for estimating ARCH models.

7.14 (Strong consistency of Jensen and Rahbek’s estimator)We consider the framework of Exercise 7.13, and follow the lines of the proof of (7.19) onpage 169.

1. Show that αcn(1) converges almost surely to α0 when ω0 = 1.

2. Does the result change if αcn(1) is replaced by αcn(ω) and if ω and ω0 are arbitrary positivenumbers? Does it entail the convergence result (7.19)?

Page 201: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

8

Tests Based on the Likelihood

In the previous chapter, we saw that the asymptotic normality of the QMLE of a GARCH modelholds true under general conditions, in particular without any moment assumption on the observedprocess. An important application of this result concerns testing problems. In particular, we are ableto test the IGARCH assumption, or more generally a given GARCH model with infinite variance.This problem is the subject of Section 8.1.

The main aim of this chapter is to derive tests for the nullity of coefficients. These tests arecomplex in the GARCH case, because of the constraints that are imposed on the estimates of thecoefficients to guarantee that the estimated conditional variance is positive. Without these con-straints, it is impossible to compute the Gaussian log-likelihood of the GARCH model. Moreover,asymptotic normality of the QMLE has been established assuming that the parameter belongs tothe interior of the parameter space (assumption A5 in Chapter 7). When some coefficients αi orβj are null, Theorem 7.2 does not apply. It is easy to see that, in such a situation, the asymptoticdistribution of

√n(θn − θ0) cannot be Gaussian. Indeed, the components θin of θn are constrained

to be positive or null. If, for instance, θ0i = 0 then√n(θin − θ0i ) = √nθin ≥ 0 for all n and the

asymptotic distribution of this variable cannot be Gaussian.Before considering significance tests, we shall therefore establish in Section 8.2 the asymptotic

distribution of the QMLE without assumption A5, at the cost of a moment assumption on theobserved process. In Section 8.3, we present the main tests (Wald, score and likelihood ratio)used for testing the nullity of some coefficients. The asymptotic distribution obtained for theQMLE will lead to modification of the standard critical regions. Two cases of particular interestwill be examined in detail: the test of nullity of only one coefficient and the test of conditionalhomoscedasticity, which corresponds to the nullity of all the coefficients αi and βj . Section 8.4is devoted to testing the adequacy of a particular GARCH(p, q) model, using portmanteau tests.The chapter also contains a numerical application in which the preeminence of the GARCH(1, 1)model is questioned.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 202: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

186 GARCH MODELS

8.1 Test of the Second-Order Stationarity Assumption

For the GARCH(p, q) model defined by (7.1), testing for second-order stationarity involves testing

H0 :q∑i=1

α0i +p∑j=1

β0j < 1 against H1 :q∑i=1

α0i +p∑j=1

β0j ≥ 1.

Introducing the vector c = (0, 1, . . . , 1)′ ∈ Rp+q+1, the testing problem is

H0 : c′θ0 < 1 against H1 : c′θ0 ≥ 1. (8.1)

In view of Theorem 7.2, the QMLE θn = (ωn, α1n, . . . , αqn, β1n, . . . βpn) of θ0 satisfies

√n(θn − θ0)

L→ N(0, (κη − 1)J−1),

under assumptions which are compatible with H0 and H1. In particular, if c′θ0 = 1 we have

√n(c′θn − 1)

L→ N(0, (κη − 1)c′J−1c).

It is thus natural to consider the Wald statistic

Tn =√n(∑q

i=1 αin +∑p

j=1 βjn − 1){(κη − 1)c′J−1c

}1/2 ,

where κη and J are consistent estimators in probability of κη and J . The following result followsimmediately from the convergence of Tn to N(0, 1) when c′θ0 = 1.

Proposition 8.1 (Critical region of stationarity test) Under the assumptions of Theorem 7.2, atest of (8.1) at the asymptotic level α is defined by the rejection region

{Tn >�−1(1− α)},where � is the N(0, 1) cumulative distribution function.

Table 8.1 Test of the infinite variance assumption for 11 stockmarket returns. Estimated standard deviations are in parentheses.

Index α + β p-value

CAC 0.983 (0.007) 0.0089DAX 0.981 (0.011) 0.0385DJA 0.982 (0.007) 0.0039DJI 0.986 (0.006) 0.0061DJT 0.983 (0.009) 0.0023DJU 0.983 (0.007) 0.0060FTSE 0.990 (0.006) 0.0525Nasdaq 0.993 (0.003) 0.0296Nikkei 0.980 (0.007) 0.0017SMI 0.962 (0.015) 0.0050S&P 500 0.989 (0.005) 0.0157

Page 203: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 187

Note that for most real series (see, for instance, Table 7.4), the sum of the estimated coefficients αand β is strictly less than 1: second-order stationarity thus cannot be rejected, for any reasonableasymptotic level (when Tn < 0, the p-value of the test is greater than 1/2). Of course, the non-rejection of H0 does not mean that the stationarity is proved. It is interesting to test the reverseassumption that the data generating process is an IGARCH, or more generally that it does not havemoments of order 2. We thus consider the problem

H0 : c′θ0 ≥ 1 against H1 : c′θ0 < 1. (8.2)

Proposition 8.2 (Critical region of nonstationarity test) Under the assumptions of Theorem7.2, a test of (8.2) at the asymptotic level α is defined by the rejection region

{Tn < �−1(α)}.

As an application, we take up the data sets of Table 7.4 again, and we give the p-values ofthe previous test for the 11 series of daily returns. For the FTSE (DAX, Nasdaq, S&P 500), theassumption of infinite variance cannot be rejected at the 5% (3%, 2%, 1%) level (see Table 8.1).The other series can be considered as second-order stationary (if one believes in the GARCH(1, 1)model, of course).

8.2 Asymptotic Distribution of the QML When θ0 isat the Boundary

In view of (7.3) and (7.9), the QMLE θn is constrained to have a strictly positive first compo-nent, while the other components are constrained to be positive or null. A general technique fordetermining the distribution of a constrained estimator involves expressing it as a function of theunconstrained estimator θ ncn (see Gourieroux and Monfort, 1995). For the QMLE of a GARCH,this technique does not work because the objective function

ln(θ) = n−1n∑t=1

t , t = t (θ) = ε2t

σ 2t

+ log σ 2t , where σ 2

t is defined by (7.4),

cannot be computed outside � (for an ARCH(1), it may happen that σt 2 := ω + α1ε2t is negative

when α1 < 0). It is thus impossible to define θ ncn .The technique that we will utilize here (see, in particular, Andrews, 1999), involves writing

θn with the aid of the normalized score vector, evaluated at θ0:

Zn := −J−1n n1/2 ∂ln(θ0)

∂θ, Jn = ∂2ln(θ0)

∂θ∂θ ′, (8.3)

with

ln(θ) = ln(θ; εn, εn−1 . . . , ) = n−1n∑t=1

t , t = t (θ) = ε2t

σ 2t

+ log σ 2t ,

where the components of ∂ln(θ0)/∂θ and of Jn are right derivatives (see (a) in the proof of Theorem8.1 on page 207).

In the proof of Theorem 7.2, we showed that

n1/2(θn − θ0) = Zn + oP (1), when θ0 ∈◦� . (8.4)

Page 204: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

188 GARCH MODELS

For any value of θ0 ∈ � (even when θ0 ∈◦�), it will be shown that the vector Zn is well defined

and satisfies

ZnL→ Z ∼ N {0, (κη − 1)J−1} , J = Eθ0

{1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0)

}, (8.5)

provided J exists. By contrast, when

θ0 ∈ ∂� := {θ ∈ � : θi = 0 for some i > 1},equation (8.4) is no longer valid. However, we will show that the asymptotic distribution ofn1/2(θn − θ0) is well approximated by that of the vector n1/2(θ − θ0) which is located at theminimal distance of Zn, under the constraint θ ∈ �. Consider thus a random vector θJn(Zn) (whichis not an estimator, of course) solving the minimization problem

θJn(Zn) = arg infθ∈�

{Zn − n1/2(θ − θ0)

}′Jn{Zn − n1/2(θ − θ0)

}. (8.6)

It will be shown that Jn converges to the positive definite matrix J . For n large enough, we thushave

θJn(Zn) = arg infθ∈�

dist2Jn{Zn,

√n(�− θ0)

},

where distJn(x, y) := {(x − y)′Jn(x − y)}1/2

is a distance between two points x and y of Rp+q+1,and where the distance between a point x and a subset S of Rp+q+1 is defined by distJn(x, S) =infs∈S distJn (x, s).

We allow θ0 to have null components, but we do not consider the (less interesting) case whereθ0 reaches another boundary of �. More precisely, we assume that

B1: θ0 ∈ (ω, ω)× [0, θ2)× · · · × [0, θp+q+1) ⊂ �,

where 0 < ω < ω and 0 < min{θ2, . . . , θp+q+1}. In this case√n(θJn(Zn)− θ0) and

√n(θn − θ0)

belong to the ‘local parameter space’

! :=⋃n

√n(�− θ0) = !1 × · · · ×!q+q+1, (8.7)

where !1 = R and, for i = 2, . . . , p + q + 1, !i = R if θ0i = 0 and !i = [0,∞) if θ0i = 0. Withthe notation

λ!n = arg infλ∈!

{λ− Zn}′ Jn {λ− Zn} ,

we thus have, with probability 1,√n(θJn(Zn)− θ0) = λ!n , for n large enough. (8.8)

The vector λ!n is the projection of Zn on !, with respect to the norm ‖x‖2Jn

:= x′Jnx (seeFigure 8.1). Since ! is closed and convex, such a projection is unique. We will show that

n1/2(θn − θ0) = λ!n+oP (1). (8.9)

Since (Zn, Jn) tends in law to (Z, J ) and λ!n is a function of (Zn, Jn) which is continuouseverywhere except at the points where Jn is singular (that is, almost everywhere with respect to

the distribution of (Z, J ) because J is invertible), we have λ!nL→ λ!, where λ! is the solution of

limiting problem

λ! := arg infλ∈!

{λ− Z}′ J {λ− Z} , Z ∼ N {0, (κη − 1)J−1} . (8.10)

Page 205: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 189

n (a – a0)√

n (qn – q0)

a0 = 0

Jn

Zn

ln

n (a – a0)√

n (w – w0)√

n(w – w0)√

n(Θ – q0)√

Λ = × [0, ∞)

nw0 – w)−(√

Λ

Figure 8.1 ARCH(1) model with θ0 = (ω0, 0) and � = [ω,ω]× [0, α]:√n(�− θ0) =

[−√n(ω0 − ω),√n(ω − ω0)]× [0,

√n(α − α0)] is the gray area; Zn

L→ N {0, (κη − 1)J−1};√

n(θn − θ0) and λ!n have the same asymptotic distribution.

In addition to B1, we retain most of the assumptions of Theorem 7.2:

B2: θ0 ∈ � and � is a compact set.

B3: γ (A0) < 0 and for all θ ∈ �,∑p

j=1 βj < 1.

B4: η2t has a nondegenerate distribution with Eη2

t = 1.

B5: If p> 0, Aθ0(z) and Bθ0 (z) do not have common roots, Aθ0(1) = 0, and α0q + β0p = 0.

B6: κη = Eη4t <∞.

We also need the following moment assumption:

B7: Eε6t <∞.

When θ0 ∈◦�, we can show the existence of the information matrix

J = Eθ0

{1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0)

}without moment assumptions similar to B7. The following example shows that, in the ARCH case,this is no longer possible when we allow θ0 ∈ ∂�.

Page 206: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

190 GARCH MODELS

Example 8.1 (The existence of J may require a moment of order 4) Consider the ARCH(2)model

εt = σtηt , σ 2t = ω0 + α01ε

2t−1 + α2ε

2t−2, (8.11)

where the true values of the parameters are such that ω0 > 0, α01 ≥ 0, α02 = 0, and the distributionof the iid sequence (ηt ) is defined, for a > 1, by

P(ηt = a) = P(ηt = −a) = 1

2a2, P(ηt = 0) = 1− 1

a2.

The process (εt ) is always stationary, for any value of α01 (since exp{−E(log η2

t )} = +∞, the

strict stationarity constraint (2.10) holds true). By contrast, εt does not possess moments of order2 when α01 ≥ 1 (see Proposition 2.2).

We have

1

σ 2t

∂σ 2t

∂α2(θ0) =

ε2t−2

ω0 + α01ε2t−1

,

so that

E

{1

σ 2t

∂σ 2t

∂α2(θ0)

}2

≥ E

{ ε2t−2

ω0 + α01ε2t−1

}2

| ηt−1 = 0

P(ηt−1 = 0)

= 1

ω20

(1− 1

a2

)E(ε4t−2

)because on the one hand ηt−1 = 0 entails εt−1 = 0, and on the other hand ηt−1 and εt−2 areindependent. Consequently, if Eε4

t = ∞ then the matrix J does not exist.

We then have the following result.

Theorem 8.1 (QML asymptotic distribution at the boundary) Under assumptions B1–B7,the asymptotic distribution of

√n(θn − θ0) is that of λ! satisfying (8.10), where ! is given

by (8.7).

Remark 8.1 (We retrieve the standard results in◦�) For θ0 ∈

◦�, the result is shown in

Theorem 7.2. Indeed, in this case ! = Rp+q+1 and

λ! = Z ∼ N {0, (κη − 1)J−1} .Theorem 8.1 is thus only of interest when θ0 is at the boundary ∂� of the parameter space.

Remark 8.2 (The moment condition B7 can sometimes be relaxed) Apart from the ARCH(q) case, it is sometimes possible to get rid of the moment assumption B7. Note that underthe condition γ (A0) < 0, we have σ 2

t (θ0) = b00 +∑∞

j=1 b0j ε2t−j with b00 > 0, b0j ≥ 0. The

derivatives ∂σ 2t /∂θk have the form of similar series. It can be shown that the ratio {∂σ 2

t /∂θ}/σ 2t

admits moments of all orders whenever any term ε2t−j which appears in the numerator is also

present in the denominator. This allows us to show (see the references at the end of the chapter)that, in the theorem, assumption B7 can be replaced by

B7:′ b0j > 0 for all j ≥ 1, where σ 2t (θ0) = b00 +

∑∞j=1 b0j ε

2t−j .

Page 207: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 191

Note that a sufficient condition for B7′

is α01 > 0 and β01 > 0 (because b0j ≥ α01βj−101 ). A nec-

essary condition is obviously that α01 > 0 (because b01 = α01). Finally, a necessary and sufficientcondition for B7′ is

{j | β0,j > 0} = ∅ andj0∏i=1

α0i > 0 for j0 = min{j | β0,j > 0}.

Obviously, according to Example 8.1, assumption B7′ is not satisfied in the ARCH case.

8.2.1 Computation of the Asymptotic DistributionIn this section, we will show how to compute the solutions of (8.10). Switching the componentsof θ , if necessary, it can be assumed without loss of generality that the vector θ (1)0 of the first d1

components of θ0 has strictly positive elements and that the vector θ(2)0 of the last d2 = p + q +1− d1 components of θ0 is null. This can be written as

Kθ0 = 0d2×1, K = (0d2×d1 , Id2

). (8.12)

More generally, it will be useful to consider all the subsets of these constraints. Let

K = {K1, . . . , K2d2−1

}be the set of the matrices obtained by deleting no, one, or several (but not all) rows of K . Note thatthe solution of the constrained minimization problem (8.10) is the unconstrained solution λ = Z

when the latter satisfies the constraint, that is, when

Z ∈ ! = Rd1 × [0,∞)d2 = {λ ∈ Rp+q+1 | Kλ ≥ 0}.

When Z /∈ !, the solution λ! coincides with that of an equality constrained problem of the form

λKi= arg min

λ:Kiλ=0(λ− Z)′J (λ− Z), Ki ∈ K.

An important difference, compared to the initial minimization program (8.10), is that theminimization is done here on a vectorial space. The solution is given by a projection (nonorthogonalwhen J is not the identity matrix). We thus obtain (see Exercise 8.1)

λKi= PiZ, where Pi = Ip+q+1 − J−1K ′

i

(KiJ

−1K ′i

)−1Ki (8.13)

is the projection matrix (orthogonal for the metric defined by J ) on the orthogonal subspace ofthe space generated by the rows of Ki . Note that λKi

does not necessarily belong to ! becauseKiλ = 0 does not imply that Kλ ≥ 0. Let C = {λKi

: Ki ∈ K and λKi≥ 0} be the class of the

admissible solutions. It follows that the solution that we are looking for is

λ! = Z 1l!(Z)+ 1l!c(Z)× arg minλ∈C

Q(λ) where Q(λ) = (λ− Z)′J (λ− Z).

This formula can be used in practice to obtain realizations of λ! from realizations of Z. TheQ(λKi

) can be obtained by writing

Q(PiZ) = Z′K ′i

(KiJ

−1K ′i

)−1KiZ. (8.14)

Page 208: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

192 GARCH MODELS

Another expression (of theoretical interest) for λ! is

λ! = Z 1lD0 (Z)+2d2−1∑i=1

PiZ 1lDi(Z)

where D0 = ! and the Di form a partition of Rp+q+1. Indeed, according to the zone to which Z

belongs, a solution λ! = λKiis obtained. We will make explicit these formulas in a few examples.

Let d = p + q + 1, z+ = z 1l(0,+∞)(z) and z− = z 1l(−∞,0)(z).

Example 8.2 (Law when only one component is at the boundary) When d2 = 1, that is,when only the last component of θ0 is zero, we have

! = Rd1 × [0,∞), K = (0, . . . , 0, 1), K = {K} ,

andλ! = Z 1l{Zd≥0} +PZ 1l{Zd<0}, P = Id − J−1K ′ (KJ−1K ′)−1

K.

We finally obtainλ! = Z − Z−d c,

where c is the last column of J−1 divided by the (d, d)th element of this matrix. Note that thelast component of λ! is Z+d . Noting that J−1 is, up to a multiplicative factor, the variance of Z,it can also be seen that

λ! =

Z1 − γ1Z

−d

...

Zp+q − γp+qZ−dZ+d

, γi = E(ZdZi)

Var(Zd). (8.15)

Thus λ!i = Zi if and only if Cov(Zi, Zd) = 0.

Example 8.3 (ARCH(2) model when the data generating process is a white noise) Consideran ARCH(2) model with θ0 = (ω0, 0, 0). We thus have d2 = 2, d1 = 1 and

! = R× [0,∞)2, K =(

0 1 00 0 1

), K = {K1,K2,K3}

with K1 = K , K2 = (0, 1, 0) and K3 = (0, 0, 1). Exercise 8.6 shows that

Z = Z1

Z2

Z3

∼ N0, = (κη − 1)J−1 =

(κη + 1)ω20 −ω0 −ω0

−ω0 1 0−ω0 0 1

.

Using KK ′ = I2 and KiK′i = 1 for i = 2, 3 in particular, we thus obtain

P1Z = (Z1 + ω0(Z2 + Z3), 0, 0)′ ,

P2Z = (Z1 + ω0Z2, 0, Z3)′ ,

P3Z = (Z1 + ω0Z3, Z2, 0)′ .

Page 209: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 193

Let P0 = I3 and K0 = 0. Using (8.14), we have

Q(PiZ) = (κη − 1)Z′K ′iKiZ

= (κη − 1)

0 for i = 0Z2

2 + Z23 for i = 1

Z22 for i = 2

Z23 for i = 3.

This shows that

λ! = Z1 + ωZ−2 + ωZ−3

Z+2Z+3

. (8.16)

In order to obtain a slightly simpler expression for the projections defined in (8.13), note thatthe constraint (8.12) can again be written as

θ0 = Hθ(1)0 , H =

(Id1

0d2×d1

). (8.17)

We define a dual of K, byH = {H0, . . . , H2d2−1

},

the set of the matrices obtained by deleting from 0 to d2 of the last d2 columns of the matrix Id .Note that the elements of H can always be numbered in such a way that H0 = Id corresponds to theabsence of constraint on θ0 and that, for i = 1, . . . , 2d2 − 1, the constraint Kiθ0 = 0 correspondsto the constraint θ0 = HiH

′i θ0. Exercise 8.2 then shows that

PiZ = Hi(H′i JHi)

−1H ′i JZ, (8.18)

for i = 0, . . . , 2d2 − 1 (with P0 = Id ). Note that (8.18) requires the inversion of only one matrixof size (d − k)× (d − k) (d − k being the number of columns of Hi ), whereas (8.13) requiresthe inversion of one matrix of size d × d and another matrix of size k × k. To illustrate this newformula, we return to our previous examples.

Example 8.4 (Example 8.2 continued) We have

H =(

Id1

01×d1

), H = {Id ,H }

and

P1Z = H(H ′JH

)−1H ′JZ =

(Z(1) + J−1

11 J12Z(2)

0

),

using the notation

J =(

J11 J12

J21 J22

), Z =

(Z(1)

Z(2)

),

where the matrix J11 is of size d1 × d1, the vectors J12 = J ′21 and Z(1) are of size d1 × 1, and J22

and Z(2) = Zd are scalars. We finally obtain

λ! = Z − Z−d

( −J−111 J12

1

),

Page 210: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

194 GARCH MODELS

which can be shown using (8.15) and

H ′J−1K(KJ−1K ′)−1 = −J−1

11 J12.

Example 8.5 (Example 8.3 continued) We have

H = 1

00

, H = {I3,H1, H2, H3} ,

with H1 = H ,

H2 = 1 0

0 00 1

and H3 = 1 0

0 10 0

.

In view of Exercise 8.6, we have

J = 1

ω20

1 ω0 ω0

ω0 ω20κη ω2

0ω0 ω2

0 ω20κη

,

which allows us to obtain

PiZ = Hi(H′i JHi)

−1H ′i JZ =

Z for i = 0(Z1 + ω0(Z2 + Z3), 0, 0)′ for i = 1(Z1 + ω0Z2, 0, Z3)

′ for i = 2(Z1 + ω0Z3, Z2, 0)′ for i = 3,

and to retrieve (8.16). Note that, in this example, the calculation are simpler with (8.13) than with

(8.18), because J−1 has a simpler expression than J.

8.3 Significance of the GARCH Coefficients

We make the assumptions of Theorem 8.1 and use the notation of Section 8.2.1. Assume θ(1)0 > 0,and consider the testing problem

H0 : θ(2)0 = 0 against H1 : θ (2)0 = 0. (8.19)

Recall that under H0, we have

√nθ (2)n

L→ Kλ!, K = (0d2×d1 , Id2

),

where the distribution of λ! is defined by

λ! := arg infλ∈!

{λ− Z}′ J {λ− Z} , Z ∼ N {0, (κη − 1)J−1} , (8.20)

with ! = Rd1 × [0,∞)d2 .

8.3.1 Tests and Rejection RegionsFor parametric assumptions of the form (8.19), the most popular tests are the Wald, score andlikelihood ratio tests.

Page 211: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 195

Wald Statistic

The Wald test looks at whether θ (2)n is close to 0. The usual Wald statistic is defined by

Wn = nθ (2)′

n

{KK ′}−1

θ (2)n ,

where is a consistent estimator of = (κη − 1)J−1.

Score (or Lagrange Multiplier or Rao) Statistic

Letθn|2 =

(θ(1)n|20

)denote the QMLE of θ constrained by θ (2) = 0. The score test aims to determine whether∂ln(θn|2

)/∂θ is not too far from 0, using a statistic of the form

Rn = n

κη|2 − 1

∂ ln(θn|2

)∂θ ′

J−1n|2

∂ ln(θn|2

)∂θ

,

where κη|2 and Jn|2 denote consistent estimators of κη and J .

Likelihood Ratio Statistic

The likelihood ratio test is based on the fact that under H0 : θ(2) = 0, the constrained (quasi)log-likelihood logLn

(θn|2

) = −(n/2)ln(θn|2

)should not be much smaller than the unconstrained

log-likelihood −(n/2)ln(θn). The test employs the statistic

Ln = n{ln(θn|2

)− ln(θn)}.

Usual Rejection Regions

From the practical viewpoint, the score statistic presents the advantage of only requiring constrainedestimation, which is sometimes much simpler than the unconstrained estimation required by thetwo other tests. The likelihood ratio statistic does not require estimation of the information matrixJ , nor the kurtosis coefficient κη. For each test, it is clear that the null hypothesis must be rejectedfor large values of the statistic. For standard statistical problems, the three statistics asymptoticallyfollow the same χ2

d2distribution under the null. At the asymptotic level α, the standard rejection

regions are thus

{Wn >χ2d2(1− α)}, {Rn >χ2

d2(1− α)}, {Ln >χ2

d2(1− α)}

where χ2d2(1− α) is the (1− α)-quantile of the χ2 distribution with d2 degrees of freedom. In

the case d2 = 1, for testing the significance of only one coefficient, the most widely used test isStudent’s t test, defined by the rejection region

{|tn|>�−1(1− α/2)}, (8.21)

where tn = √nθ (2)n

{KK ′}−1/2

. This test is equivalent to the standard Wald test because tn =√Wn (tn being here always positive or null, because of the positivity constraints of the QML

estimates) and {�−1(1− α/2)

}2 = χ21 (1− α).

Page 212: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

196 GARCH MODELS

Our testing problem is not standard because, by Theorem 8.1, the asymptotic distribution of θnis not normal. We will see that, among the previous rejection regions, only that of the score testasymptotically has the level α.

8.3.2 Modification of the Standard TestsThe following proposition shows that for the Wald and likelihood ratio tests, the asymptotic dis-tribution is not the usual χ2

d2under the null hypothesis. The proposition also shows that the

asymptotic distribution of the score test remains the χ2d2

distribution. The asymptotic distributionof Rn is not affected by the fact that, under the null hypothesis, the parameter is at the boundaryof the parameter space. These results are not very surprising. Take the example of an ARCH(1)with the hypothesis H0 : α0 = 0 of absence of ARCH effect. As illustrated by Figure 8.2, thereis a nonzero probability that α be at the boundary, that is, that α = 0. Consequently Wn = nα2

admits a mass at 0 and does not follow, even asymptotically, the χ21 law. The same conclusion

can be drawn for the likelihood ratio test. On the contrary, the score n1/2∂ln(θ0)/∂θ can take aswell positive or negative values, and does not seem to have a specific behavior when θ0 is atthe boundary.

Proposition 8.3 (Asymptotic distribution of the three statistics under H0) Under H0 and theassumptions of Theorem 8.1,

WnL→ W = λ!

′�λ!, (8.22)

RnL→ R ∼ χ2

d2, (8.23)

LnL→ L = −1

2(λ! − Z)′J (λ! − Z)+ 1

2Z′K ′ {KJ−1K ′}−1

KZ

= −1

2

{infKλ≥0

‖Z − λ‖2J − inf

Kλ=0‖Z − λ‖2

J

}, (8.24)

where � = K ′ {(κη − 1)KJ−1K ′}−1K and λ! satisfies (8.20).

a = a0 = 0a

a0 = 0 a

a∧ ∧

Figure 8.2 Concentrated log-likelihood (solid line) α �→ logLn(ω, α) for an ARCH(1) model.Assume there is no ARCH effect: the true value of the ARCH parameter is α0 = 0. In the configu-ration on the right, the likelihood maximum does not lie at the boundary and the three statistics Wn,Rn and Ln take strictly positive values. In the configuration on the left, we have Wn = nα2 = 0 andLn = 2 {logLn(ω, α)− logLn(ω, 0)} = 0, whereas Rn = {∂ logLn(ω, 0)/∂α}2 continues to take astrictly positive value.

Page 213: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 197

Remark 8.3 (Equivalence of the statistics Wn and Ln) Let κη be an estimator whichconverges in probability to κη. We can show that

Wn = 2

κη − 1Ln + oP (1)

under the null hypothesis. The Wald and likelihood ratio tests will thus have the same asymptoticcritical values, and will have the same local asymptotic powers (see Exercises 8.8 and 8.9, andSection 8.3.4). They may, however, have different asymptotic behaviors under nonlocal alternatives.

Remark 8.4 (Assumptions on the tests) In order for the Wald statistic to be well defined, �must exist, that is, J = J (θ0) must exist and must be invertible. This is not the case, in particular,for a GARCH(p, q) at θ0 = (ω0, 0, . . . , 0), when p = 0. It is thus impossible to carry out a Waldtest on the simultaneous nullity of all the αi and βj coefficients in a GARCH(p, q), p = 0. Theassumptions of Theorem 8.1 are actually required, in particular the identifiability assumptions. Itis thus impossible to test, for instance, an ARCH(1) against a GARCH(2, 1), but we can test, forinstance, an ARCH(1) against an ARCH(3).

A priori , the asymptotic distributions W and L depend on J , and thus on nuisance parameters.We will consider two particular cases: the case where we test the nullity of only one GARCHcoefficient and the case where we test the nullity of all the coefficients of an ARCH. In thetwo cases the asymptotic laws of the test statistics are simpler and do not depend on nuisanceparameters. In the second case, both the test statistics and their asymptotic laws are simplified.

8.3.3 Test for the Nullity of One CoefficientConsider the case d2 = 1, which is perhaps the most interesting case and corresponds to testingthe nullity of only one coefficient. In view of (8.15), the last component of λ! is equal to Z+d . Wethus have

WnL→ W =

(Kλ!

)2

KK ′ =(KZ)2

VarKZ1l{KZ≥0} =

(Z∗)2 1l{Z∗≥0}

where Z∗ ∼ N(0, 1). Using the symmetry of the Gaussian distribution, and the independencebetween Z∗2 and 1l{Z∗ > 0} when Z∗ follows the real normal law, we obtain

WnL→ W ∼ 1

2δ0 + 1

2χ2

1 (where δ0 denotes the Dirac measure at 0).

TestingH0 : θ (2)0 := θ0(p + q + 1) = 0 against H1 : θ0(p + q + 1)> 0,

can thus be achieved by using the critical region {Wn >χ21 (1− 2α)} at the asymptotic level

α ≤ 1/2. In view of Remark 8.3, we can define a modified likelihood ratio test of critical region{2Ln/(κη − 1)>χ2

1 (1− 2α)}. Note that the standard Wald test {Wn >χ2

1 (1− α)} has the asymp-totic level α/2, and that the asymptotic level of the standard likelihood ratio test

{Ln >χ2

1 (1− α)}

is much larger than α when the kurtosis coefficient κη is large. A modified version of the Studentt test is defined by the rejection region

{tn >�−1(1− α)}, (8.25)

Page 214: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

198 GARCH MODELS

We observe that commercial software – such as GAUSS, R, RATS, SAS and SPSS – do not usethe modified version (8.25), but the standard version (8.21). This standard test is not of asymptoticlevel α but only α/2. To obtain a t test of asymptotic level α it then suffices to use a test ofnominal level 2α.

Example 8.6 (Empirical behavior of the tests under the null) We simulated 5000 indepen-dent samples of length n = 100 and n = 1000 of a strong N(0, 1) white noise. On each realizationwe fitted an ARCH(1) model εt = {ω + αε2

t−1}1/2ηt , by QML, and carried out tests of H0 : α = 0against H1 : α > 0.

We began with the modified Wald test with rejection region{Wn = nα2 >χ2

1 (0.90) = 2.71}.

This test is of asymptotic level 5%. For the sample size n = 100, we observed a relative rejectionfrequency of 6.22%. For n = 1000, we observe a relative rejection frequency of 5.38%, whichis not significantly different from the theoretical 5%. Indeed, an elementary calculation showsthat, on 5000 independent replications of a same experiment with success probability 5%, thesuccess percentage should vary between 4.4% and 5.6% with a probability of approximately 95%.Figure 8.3 shows that the empirical distribution of Wn is quite close to the asymptotic distributionδ0/2+ χ2

1 /2, even for the small sample size n = 100.We then carried out the score test defined by the rejection region{

Rn = nR2 >χ21 (0.95) = 3.84

},

where R2 is the determination coefficient of the regression of ε2t on 1 and ε2

t−1. This test is alsoof asymptotic level 5%. For the sample size n = 100, we observed a relative rejection frequencyof 3.40%. For n = 1000, we observed a relative frequency of 4.32%.

We also used the modified likelihood ratio test. For the sample size n = 100, we observed arelative rejection frequency of 3.20%, and for n = 1000 we observed 4.14%.

On these simulation experiments, the Type I error is thus slightly better controlled by themodified Wald test than by the score and modified likelihood ratio tests.

Example 8.7 (Comparison of the tests under the alternative hypothesis) We implementedthe Wn, Rn and Ln tests of the null hypothesis H0 : α01 = 0 in the ARCH(1) model

2 4 6 8 10 12 14

0.05

0.1

0.15

0.2

2 4 6 8 10 12 14

0.05

0.1

0.15

0.2

0.25

Figure 8.3 Comparison between a kernel density estimator of the Wald statistic (dotted line)and the χ2

1 /2 density on [0.5,∞) (solid line) on 5000 simulations of an ARCH(1) process withα01 = 0: (left) for sample size n = 100; (right) for n = 1000.

Page 215: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 199

0.02 0.04 0.06 0.08 0.1 0.12 0.14

0.1

0.2

0.3

0.4

0.5

0.6

0.02 0.04 0.06 0.08 0.1 0.12 0.14

0.1

0.2

0.3

0.4

0.5

0.6

% reject % reject

a a

Figure 8.4 Comparison of the observed powers of the Wald test (thick line), of the score test(dotted line) and of the likelihood ratio test (thin line), as function of the nominal level α, on 5000simulations of an ARCH(1) process: (left) for n = 100 and α01 = 0.2; (right) for n = 1000 andα01 = 0.05.

εt = {ω + α01ε2t−1}1/2ηt , ηt ∼ N(0, 1). Figure 8.4 compares the observed powers of the three

tests, that is, the relative frequency of rejection of the hypothesis H0 that there is no ARCHeffect, on 5000 independent realizations of length n = 100 and n = 1000 of an ARCH(1) modelwith α01 = 0.2 when n = 100, and α01 = 0.05 when n = 1000. On these simulated series, themodified Wald test turns out to be the most powerful.

8.3.4 Conditional Homoscedasticity Tests with ARCH ModelsAnother interesting case is that obtained with d1 = 1, θ(1) = ω, p = 0 and d2 = q. This casecorresponds to the test of the conditional homoscedasticity null hypothesis

H0 : α01 = · · · = α0q = 0 (8.26)

in an ARCH(q) model{εt = σtηt , (ηt ) iid (0, 1)σ 2t = ω +∑q

i=1 αiε2t−i , ω> 0, αi ≥ 0.

(8.27)

We will see that for testing (8.26) there exist very simple forms of the Wald and score statistics.

Simplified Form of the Wald Statistic

Using Exercise 8.6, we have

(θ0) = (κη − 1)J (θ0)−1 =

(κη + q − 1)ω2 −ω · · · −ω

−ω...

−ωIq

.

Since KK ′ = Iq , we obtain a very simple form for the Wald statistic:

Wn = n

q∑i=1

α2i . (8.28)

Page 216: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

200 GARCH MODELS

Asymptotic Distribution W and L

A trivial extension of Example 8.3 yields

λ! =

Z1 + ω

∑di=2 Z

−i

Z+2...

Z+d

. (8.29)

The asymptotic distribution of n∑d

i=2 α2i is thus that of

d∑i=2

(Z+i)2,

where the Zi are independent N(0, 1). Thus, in the case where an ARCH(q) is fitted to a whitenoise we have

Wn = n

q∑i=1

α2i

L→ 1

2qδ0 +

q∑i=1

Ciq

1

2qχ2i . (8.30)

This asymptotic distribution is tabulated and the critical values are given in Table 8.2. In viewof Remark 8.3, Table 8.2 also yields the asymptotic critical values of the modified likelihoodratio statistic 2Ln/(κη − 1). Table 8.3 shows that the use of the standard χ2

q (1− α)-based criticalvalues of the Wald test would lead to large discrepancies between the asymptotic levels and thenominal level α.

Table 8.2 Asymptotic critical value cq,α , at level α, of the Wald test of rejection region {n∑q

i=1α2i > cq,α} for the conditional homoscedasticity hypothesis H0 : α1 = · · · = αq = 0 in an ARCH(q) model.

α (%)

q 0.1 1 2.5 5 10 15

1 9.5493 5.4119 3.8414 2.7055 1.6424 1.07422 11.7625 7.2895 5.5369 4.2306 2.9524 2.22603 13.4740 8.7464 6.8610 5.4345 4.0102 3.18024 14.9619 10.0186 8.0230 6.4979 4.9553 4.04285 16.3168 11.1828 9.0906 7.4797 5.8351 4.8519

Table 8.3 Exact asymptotic level (%) of erroneous Wald tests, of rejection region {n∑q

i=1α2i > χ2

q (1− α)}, under the conditional homoscedasticity assumption H0 : α1 = · · · = αq = 0 inan ARCH(q) model.

α (%)

q 0.1 1 2.5 5 10 15

1 0.05 0.5 1.25 2.5 5 7.52 0.04 0.4 0.96 1.97 4.09 6.323 0.02 0.28 0.75 1.57 3.36 5.294 0.02 0.22 0.59 1.28 2.79 4.475 0.01 0.17 0.48 1.05 2.34 3.81

Page 217: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 201

Score Test

For the hypothesis (8.26) that all the α coefficients of an ARCH(q) model are equal to zero, thescore statistic Rn can be simplified. To work within the linear regression framework, write

∂ ln(θ(1)n|2, 0)

∂θ= n−1X′Y,

where Y is the vector of length n of the ‘dependent’ variable 1− ε2t /ω, where X is the n× (q + 1)

matrix of the constant ω−1 (in the first column) and of the ‘explanatory’ variables ε2t−i ω

−1 (incolumn i + 1, with the convention εt = 0 for t ≤ 0), and ω = θ

(1)n|2 = n−1∑n

t=1 ε2t . Estimating

J (θ0) by n−1X′X, and κη − 1 by n−1Y ′Y , we obtain

Rn = nY ′X(X′X)−1X′Y

Y ′Y,

and one recognizes n times the coefficient of determination in the linear regression of Y on thecolumns of X. Since this coefficient is not changed by linear transformation of the variables (seeExercise 5.11), we simply have Rn = nR2, where R2 is the coefficient of determination in theregression of ε2

t on a constant and q lagged values ε2t−1, . . . , ε

2t−q . Under the null hypothesis of

conditional homoscedasticity, Rn asymptotically follows the χ2q law.

The previous simple forms of the Wald and score tests are obtained with estimators of J whichexploit the particular form of the matrix under the null. Note that there exist other versions of thesetests, obtained with other consistent estimators of J . The different versions are equivalent underthe null, but can have different asymptotic behaviors under the alternative.

8.3.5 Asymptotic Comparison of the TestsThe Wald and score tests that we have just defined are in general consistent, that is, their powersconverge to 1 when they are applied to a wide class of conditionally heteroscedastic processes. Anasymptotic study will be conducted via two different approaches: Bahadur’s approach compares therates of convergence to zero of the p-values under fixed alternatives, whereas Pitman’s approachcompares the asymptotic powers under a sequence of local alternatives, that is, a sequence ofalternatives tending to the null as the sample size increases.

Bahadur’s Approach

Let SW(t) = P(W> t) and SR(t) = P(R> t) be the asymptotic survival functions of the two teststatistics, under the null hypothesis H0 defined by (8.26). Consider, for instance, the Wald test.Under the alternative of an ARCH(q) which does not satisfy H0, the p-value of the Wald testSW(Wn) converges almost surely to zero as n→∞ because

Wn

n→

q∑i=1

α2i = 0.

The p-value of a test is typically equivalent to exp{−nc/2}, where c is a positive constant calledthe Bahadur slope. Using the fact that

log SW(x) ∼ logP(χ2q > x) x →∞, (8.31)

Page 218: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

202 GARCH MODELS

and that limx→∞ logP(χ2q > x) ∼ −x/2, the (approximate1) Bahadur slope of the Wald test

is thus

limn→∞−

2

nlog SW(Wn) =

q∑i=1

α2i , a.s.

To compute the Bahadur slope of the score test, note that we have the linear regression modelε2t = ω +A(B)ε2

t + νt , where νt = (η2t − 1)σ 2

t (θ0) is the linear innovation of ε2t . We then have

Rn

n= R2→1− Var(νt )

Var(ε2t ).

The previous limit is thus equal to the Bahadur slope of the score test. The comparison of the twoslopes favors the score test over the Wald test.

Proposition 8.4 Let (εt ) be a strictly stationary and nonanticipative solution of the ARCH(q)model (8.27), with E(ε4

t ) <∞ and∑q

i=1 α0i > 0. The score test is considered as more efficient thanthe Wald test in Bahadur’s sense because its slope is always greater or equal to that of the Waldtest, with equality when q = 1.

Example 8.8 (Slopes in the ARCH(1) and ARCH(2) cases) The slopes are the same in theARCH(1) case because

limn→∞

Wn

n= lim

n→∞Rn

n= α2

1 .

In the ARCH(2) case with fourth-order moment, we have

limn→∞

Wn

n= α2

1 + α22, lim

n→∞Rn

n= α2

1 + α22 +

2α21α2

1− α2.

We see that the second limit is always larger than the first. Consequently, in Bahadur’s sense, theWald and Rao tests have the same asymptotic efficiency in the ARCH(1) case. In the ARCH(2)case, the score test is, still in Bahadur’s sense, asymptotically more efficient than the Wald test fortesting the conditional homoscedasticity (that is, α1 = α2 = 0).

Bahadur’s approach is sometimes criticized for not taking account of the critical value of test, andthus for not really comparing the powers. This approach only takes into account the (asymptotic)distribution of the statistic under the null and the rate of divergence of the statistic under the alter-native. It is unable to distinguish a two-sided test from its one-sided counterpart (see Exercise 8.8).In this sense the result of Proposition 8.4 must be put into perspective.

Pitman’s Approach

In the ARCH(1) case, consider a sequence of local alternatives Hn(τ) : α1 = τ/√n. We can show

that under this sequence of alternatives,

Wn = nα21

L→ (U + τ)2 1l{U+τ > 0}, U ∼ N(0, 1).

Consequently, the local asymptotic power of the Wald test is

P (U + τ > c1) = 1−�(c1 − τ), c1 = �−1(1− α). (8.32)

1 The term ‘approximate’ is used by Bahadur (1960) to emphasize that the exact survival function SWn (t)

is approximated by the asymptotic survival function SW(t). See also Bahadur (1967) for a discussion on theexact and approximate slopes.

Page 219: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 203

The score test has the local asymptotic power

P{(U + τ)2 >c2

2) = 1−�(c2 − τ)+�(−c2 − τ), c2 = �−1(1− α/2) (8.33)

Note that (8.32) is the power of the test of the assumption H0 : θ = 0 against the assumptionH1 : θ = τ > 0, based on the rejection region of {X>c1} with only one observation X ∼ N(θ, 1).The power (8.33) is that of the two-sided test {|X|>c2}. The tests {X>c1} and {|X|>c2} have thesame level, but the first test is uniformly more powerful than the second (by the Neyman–Pearsonlemma, {X>c1} is even uniformly more powerful than any test of level less than or equal to α,for any one-sided alternative of the form H1). The local asymptotic power of the Wald test is thusuniformly strictly greater than that of Rao’s test for testing for conditional homoscedasticity in anARCH(1) model.

Consider the ARCH(2) case, and a sequence of local alternatives Hn(τ) : α1 = τ1/√n, α2 =

τ2/√n. Under this sequence of alternatives

Wn = n(α21 + α2

1)L→ (U1 + τ1)

2 1l{U1+τ1 > 0} +(U2 + τ2)2 1l{U2+τ2 > 0},

with (U1, U2)′ ∼ N(0, I2). Let c1 be the critical value of the Wald test of level α. The local

asymptotic power of the Wald test is

P(U1 + τ1 >

√c1)P (U2 + τ2 < 0)

+∫ ∞

−τ2

P{(U1 + τ1)

2 1l{U1+τ1 > 0}>c1 − (x + τ2)2}φ(x)dx

= {1−�(√c1 − τ1)

}�(−τ2)+ 1−�(−τ2 +√c1)

+∫ −τ2+√c1

−τ2

{1−�

(√c1 − (x + τ2)2 − τ1

)}φ(x)dx

Let c2 be the critical value of the Rao test of level α. The local asymptotic power of the Rao test is

P{(U1 + τ1)

2 + (U1 + τ2)2 >c2

},

where (U1 + τ1)2 + (U2 + τ2)

2 follows a noncentral χ2 distribution, with two degrees of freedomand noncentrality parameter τ2

1 + τ 22 . Figure 8.5 compares the powers of the two tests when τ1 = τ2.

0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

1

t1 = t2

Figure 8.5 Local asymptotic power of the Wald test (solid line) and of the Rao test (dotted line)for testing for conditional homoscedasticity in an ARCH(2) model.

Page 220: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

204 GARCH MODELS

Thus, the comparison of the local asymptotic powers clearly favors the Wald test over thescore test, counterbalancing the result of Proposition 8.4.

8.4 Diagnostic Checking with Portmanteau Tests

To check the adequacy of a given time series model, for instance an ARMA(p, q) model, it iscommon practice to test the significance of the residual autocorrelations. In the GARCH frameworkthis approach is not relevant because the process ηt = εt /σt is always a white noise (possibly amartingale difference) even when the volatility is misspecified, that is, when εt =

√htηt with

ht = σ 2t . To check the adequacy of a volatility model, for instance a GARCH(p, q) of the form

(7.1), it is much more fruitful to look at the squared residual autocovariances

r(h) = 1

n

n∑t=|h|+1

(η2t − 1)(η2

t−|h| − 1), η2t =

ε2t

σ 2t

,

where |h| < n, σt = σt (θn), σt is defined by (7.4) and θn is the QMLE given by (7.9).For any fixed integer m, 1 ≤ m < n, consider the statistic rm = (r(1), . . . , r(m))′. Let κη and

J be weakly consistent estimators of κη and J . For instance, one can take

κη = 1

n

n∑t=1

ε4t

σ 4t (θn)

, J = 1

n

n∑t=1

1

σ 4t (θn)

∂σ 2t (θn)

∂θ

∂σ 2t (θn)

∂θ ′.

Define also the m× (p + q + 1) matrix Cm whose (h, k)th element, for 1 ≤ h ≤ m and 1 ≤ k ≤p + q + 1, is given by

Cm(h, k) = − 1

n

n∑t=h+1

(η2t−h − 1)

1

σ 2t (θn)

∂σ 2t (θn)

∂θk.

Theorem 8.2 (Asymptotic distribution of a portmanteau test statistic) Under the assump-tions of Theorem 7.2 ensuring the consistency and asymptotic normality of the QMLE,

nr ′mD−1rm

L→ χ2m,

with D = (κη − 1)2Im − (κη − 1)CmJ−1C ′m.

The adequacy of the GARCH(p,q) model is rejected at the asymptotic level α when{nr ′mD

−1rm >χ2m(1− α)

}.

8.5 Application: Is the GARCH(1,1) ModelOverrepresented?

The GARCH(1,1) model is by far the most widely used by practitioners who wish to estimatethe volatility of daily returns. In general, this model is chosen a priori , without implementingany statistical identification procedure. This practice is motivated by the common belief that the

Page 221: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 205

GARCH(1,1) (or its simplest asymmetric extensions) is sufficient to capture the properties offinancial series and that higher-order models may be unnecessarily complicated.

We will show that, for a large number of series, this practice is not always statistically justified.We consider daily and weekly series of 11 returns (CAC, DAX, DJA, DJI, DJT, DJU, FTSE,Nasdaq, Nikkei, SMI and S&P 500) and five exchange rates. The observations cover the periodfrom January 2, 1990 to January 22, 2009 for the daily returns and exchange rates, and fromJanuary 2, 1990 to January 20, 2009 for the weekly returns (except for the indices for which thefirst observations are after 1990). We begin with the portmanteau tests defined in Section 8.4.Table 8.4 shows that the ARCH models (even with large order q) are generally rejected, whereasthe GARCH(1,1) is only occasionally rejected. This table only concerns the daily returns, butsimilar conclusions hold for the weekly returns and exchange rates. The portmanteau tests areknown to be omnibus tests, powerful for a broad spectrum of alternatives. As we will now see,for the specific alternatives for which they are built, the tests defined in Section 8.3 (Wald, scoreand likelihood ratio) may be much more powerful.

The GARCH(1,1) model is chosen as the benchmark model, and is successively tested againstthe GARCH(1,2), GARCH(1,3), GARCH(1,4) and GARCH(2,1) models. In each case, the threetests (Wald, score and likelihood ratio) are applied. The empirical p-values are displayed inTable 8.5. This table shows that: (i) the results of the tests strongly depend on the alternative;

Table 8.4 Portmanteau test p-values for adequacy of the ARCH(5) and GARCH(1,1) modelsfor daily returns of stock market indices, based on m squared residual autocovariances. p-valuesless than 5% are in bold, those less than 1% are underlined.

m

1 2 3 4 5 6 7 8 9 10 11 12

Portmanteau tests for adequacy of the ARCH(5)CAC 0.194 0.010 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000DAX 0.506 0.157 0.140 0.049 0.044 0.061 0.080 0.119 0.140 0.196 0.185 0.237DJA 0.441 0.34 0.139 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000DJI 0.451 0.374 0.015 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000DJT 0.255 0.514 0.356 0.044 0.025 0.013 0.020 0.023 0.000 0.000 0.000 0.000DJU 0.477 0.341 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000FTSE 0.139 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000Nasdaq 0.025 0.031 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000Nikkei 0.004 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000SMI 0.502 0.692 0.407 0.370 0.211 0.264 0.351 0.374 0.463 0.533 0.623 0.700S&P 500 0.647 0.540 0.012 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Portmanteau tests for adequacy of the GARCH(1,1)CAC 0.312 0.379 0.523 0.229 0.301 0.396 0.495 0.578 0.672 0.660 0.704 0.743DAX 0.302 0.583 0.574 0.704 0.823 0.901 0.938 0.968 0.983 0.989 0.994 0.995DJA 0.376 0.424 0.634 0.740 0.837 0.908 0.838 0.886 0.909 0.916 0.938 0.959DJI 0.202 0.208 0.363 0.505 0.632 0.742 0.770 0.812 0.871 0.729 0.748 0.811DJT 0.750 0.100 0.203 0.276 0.398 0.518 0.635 0.721 0.804 0.834 0.885 0.925DJU 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000FTSE 0.733 0.940 0.934 0.980 0.919 0.964 0.328 0.424 0.465 0.448 0.083 0.108Nasdaq 0.523 0.024 0.061 0.019 0.001 0.001 0.002 0.001 0.002 0.001 0.001 0.002Nikkei 0.049 0.146 0.246 0.386 0.356 0.475 0.567 0.624 0.703 0.775 0.718 0.764SMI 0.586 0.758 0.908 0.959 0.986 0.995 0.996 0.999 0.999 0.999 0.999 0.999S&P 500 0.598 0.364 0.528 0.643 0.673 0.394 0.512 0.535 0.639 0.432 0.496 0.594

Page 222: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

206 GARCH MODELS

Table 8.5 p-values for tests of the null of a GARCH(1,1) model against the GARCH(1,2),GARCH(1,3), GARCH(1,4) and GARCH(2,1) alternatives, for returns of stock market indicesand exchange rates. p-values less than 5% are in bold, those less than 1% are underlined.

Alternative

GARCH(1,2) GARCH(1,3) GARCH(1,4) GARCH(2,1)

Wn Rn Ln Wn Rn Ln Wn Rn Ln Wn Rn Ln

Daily returns of indicesCAC 0.007 0.033 0.013 0.005 0.000 0.001 0.024 0.188 0.040 0.500 0.280 0.500DAX 0.002 0.001 0.003 0.001 0.000 0.000 0.001 0.162 0.014 0.350 0.031 0.143DJA 0.158 0.337 0.166 0.259 0.285 0.269 0.081 0.134 0.064 0.500 0.189 0.500DJI 0.044 0.100 0.049 0.088 0.071 0.094 0.107 0.143 0.114 0.500 0.012 0.500DJT 0.469 0.942 0.470 0.648 0.009 0.648 0.519 0.116 0.517 0.369 0.261 0.262DJU 0.500 0.000 0.500 0.643 0.000 0.643 0.725 0.001 0.725 0.017 0.000 0.005FTSE 0.080 0.122 0.071 0.093 0.223 0.083 0.213 0.423 0.205 0.458 0.843 0.442Nasdaq 0.469 0.922 0.468 0.579 0.983 0.578 0.683 0.995 0.702 0.500 0.928 0.500Nikkei 0.004 0.002 0.004 0.042 0.332 0.081 0.052 0.526 0.108 0.238 0.000 0.027SMI 0.224 0.530 0.245 0.058 0.202 0.063 0.086 0.431 0.108 0.500 0.932 0.500SP 500 0.053 0.079 0.047 0.089 0.035 0.078 0.055 0.052 0.043 0.500 0.045 0.500Weekly returns of indicesCAC 0.017 0.143 0.049 0.028 0.272 0.068 0.061 0.478 0.142 0.500 0.575 0.500DAX 0.154 0.000 0.004 0.674 0.798 0.674 0.667 0.892 0.661 0.043 0.000 0.000DJA 0.194 0.001 0.052 0.692 0.607 0.692 0.679 0.899 0.597 0.003 0.000 0.000DJI 0.173 0.000 0.030 0.682 0.482 0.682 0.788 0.358 0.788 0.000 0.000 0.000DJT 0.428 0.623 0.385 0.628 0.456 0.628 0.693 0.552 0.693 0.002 0.000 0.004DJU 0.500 0.747 0.500 0.646 0.011 0.646 0.747 0.038 0.747 0.071 0.003 0.017FTSE 0.188 0.484 0.222 0.183 0.534 0.214 0.242 0.472 0.272 0.500 0.532 0.500Nasdaq 0.441 0.905 0.448 0.387 0.868 0.412 0.199 0.927 0.266 0.069 0.961 0.344Nikkei 0.500 0.140 0.500 0.310 0.154 0.260 0.330 0.316 0.462 0.030 0.138 0.053SMI 0.500 0.720 0.500 0.217 0.144 0.150 0.796 0.754 0.796 0.314 0.769 0.360SP 500 0.117 0.000 0.001 0.659 0.114 0.659 0.724 0.051 0.724 0.000 0.000 0.000Daily exchange rates$/¤ 0.452 0.904 0.452 0.194 0.423 0.181 0.066 0.000 0.015 0.500 0.002 0.500�/¤ 0.037 0.000 0.002 0.616 0.090 0.618 0.304 0.000 0.227 0.136 0.000 0.000£/¤ 0.439 0.879 0.440 0.471 0.905 0.464 0.677 0.981 0.677 0.258 0.493 0.248CHF/¤ 0.141 0.000 0.012 0.641 0.152 0.641 0.520 0.154 0.562 0.012 0.000 0.000C$/¤ 0.500 0.268 0.500 0.631 0.714 0.631 0.032 0.000 0.002 0.045 0.045 0.029

(ii) the p-values of the three tests can be quite different; (iii) for most of the series, the GARCH(1,1)model is clearly rejected. Point (ii) is not surprising because the asymptotic equivalence betweenthe three tests is only shown under the null hypothesis or under local alternatives. Moreover,because of the positivity constraints, it is possible (see, for instance, the DJU) that the estimatedGARCH(1,2) model satisfies α2 = 0 with ∂ ln(θn|2)/∂α2 � 0. In this case, when the estimatorslie at the boundary of the parameter space and the score is strongly positive, the Wald and LRtests do not reject the GARCH(1,1) model, whereas the score does reject it. In other situations,the Wald or LR test rejects the GARCH(1,1) whereas the score does not (see, for instance, theDAX for the GARCH(1,4) alternative). This study shows that it is often relevant to employ severaltests and several alternatives. The conservative approach of Bonferroni (rejecting if the minimalp-value multiplied by the number of tests is less than a given level α), leads to rejection of the

Page 223: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 207

GARCH(1,1) model for 16 out of the 24 series in Table 8.5. Other procedures, less conservativethan that of Bonferroni, could also be applied (see Wright, 1992) without changing the generalconclusion.

In conclusion, this study shows that the GARCH(1,1) model is certainly overrepresented inempirical studies. The tests presented in this chapter are easily implemented and lead to selectionof GARCH models that are more elaborate than the GARCH(1,1).

8.6 Proofs of the Main Results*

Proof of Theorem 8.1We will split the proof into seven parts.

(a) Asymptotic normality of score vector. When θ0 ∈ ∂�, the function σ 2t (θ) can take nega-

tive values in a neighborhood of θ0, and t (θ) = ε2t /σ

2t (θ)+ log σ 2

t (θ) is then undefined in thisneighborhood. Thus the derivative of t (·) does not exist at θ0. By contrast the right derivativesexist, and the vector ∂ t (θ0)/∂θ of the right partial derivatives is written as an ordinary derivative.The same convention is used for the higher-order derivatives, as well as for the right derivativesof ln, t and ln at θ0. With these conventions, the formulas for the derivative of criterion remainvalid:

∂ t (θ)

∂θ={

1− ε2t

σ 2t

}{1

σ 2t

∂σ 2t

∂θ

},

∂2 t (θ)

∂θ∂θ ′={

1− ε2t

σ 2t

}{1

σ 2t

∂2σ 2t

∂θ∂θ ′

}+{

2ε2t

σ 2t

− 1

}{1

σ 2t

∂σ 2t

∂θ

}{1

σ 2t

∂σ 2t

∂θ ′

}. (8.34)

It is then easy to see that J = E∂2 t (θ0)/∂θ∂θ′ exists under the moment assumption B7. The

ergodic theorem immediately yields

Jn → J, almost surely, (8.35)

where J is invertible, by assumptions B4 and B5 (cf. Proof of Theorem 7.2). The conver-gence (8.5) then directly follows from Slutsky’s lemma and the central limit theorem given inCorollary A.1.

(b) Uniform integrability and continuity of the second-order derivatives. It will be shown that,for all ε > 0, there exists a neighborhood V(θ0) of θ0 such that, almost surely,

Eθ0 supθ∈V(θ0)∩�

∥∥∥∥∂2 t (θ)

∂θ∂θ ′

∥∥∥∥ <∞ (8.36)

and

limn→∞

1

n

n∑t=1

supθ∈V(θ0)∩�

∥∥∥∥∂2 t (θ)

∂θ∂θ ′− ∂2 t (θ0)

∂θ∂θ ′

∥∥∥∥ ≤ ε. (8.37)

Using elementary derivative calculations and the compactness of �, it can be seen that

∂σ 2t (θ)

∂θi= b

(i)0 (θ)+

∞∑k=1

b(i)k (θ)ε2

t−k and∂2σ 2

t (θ)

∂θi∂θj= b

(i,j)

0 (θ)+∞∑k=1

b(i,j)

k (θ)ε2t−k,

Page 224: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

208 GARCH MODELS

with

supθ∈�

|b(i)k (θ)| ≤ Kρk, supθ∈�

|b(i,j)k (θ)| ≤ Kρk,

where K > 0 and 0 < ρ < 1. Since supθ∈� 1/σ 2t (θ) ≤ K , assumption B7 then entails that

∥∥∥∥supθ∈�

1

σ 2t

∂σ 2t (θ)

∂θ

∥∥∥∥3

<∞,

∥∥∥∥supθ∈�

1

σ 2t

∂2σ 2t (θ)

∂θ∂θ ′

∥∥∥∥3

<∞,

∥∥∥∥supθ∈�

ε2t

σ 2t

∥∥∥∥3

<∞.

In view of (8.34), the Holder and Minkowski inequalities then show (8.36) for all neighborhoodof θ0. The ergodic theorem entails that

limn→∞

1

n

n∑t=1

supθ∈V(θ0)∩�

∥∥∥∥∂2 t (θ)

∂θ∂θ ′− ∂2 t (θ0)

∂θ∂θ ′

∥∥∥∥ = Eθ0 supθ∈V(θ0)∩�

∥∥∥∥∂2 t (θ)

∂θ∂θ ′− ∂2 t (θ0)

∂θ∂θ ′

∥∥∥∥ .This expectation decreases to 0 when the neighborhood V(θ0) decreases to the singleton {θ0}, whichshows (8.37).

(c) Convergence in probability of θJn(Zn) to θ0 at rate√n. In view of (8.35), for n large enough,

‖x‖Jn =(x ′Jnx

)1/2defines a norm. The definition (8.6) of θJn(Zn) entails that ‖√n(θJn(Zn)−

θ0)− Zn‖Jn ≤ ‖Zn‖Jn . The triangular inequality then implies that

‖√n(θJn(Zn)− θ0)‖Jn = ‖√n(θJn (Zn)− θ0)− Zn + Zn‖Jn

≤ ‖√n(θJn (Zn)− θ0)− Zn‖Jn + ‖Zn‖Jn≤ 2

(Z′nJnZn

)1/2 = OP (1),

where the last equality comes from the convergence in law of (Zn, Jn) to (Z, J ). This entails thatθJn(Zn)− θ0 = OP (n

−1/2).

(d) Quadratic approximation of the objective function. A Taylor expansion yields

ln(θ) = ln(θ0)+ ∂ ln(θ0)

∂θ ′(θ − θ0)+ 1

2(θ − θ0)

′[∂2 ln(θ∗ij )∂θ∂θ ′

](θ − θ0)

= ln(θ0)+ ∂ ln(θ0)

∂θ ′(θ − θ0)+ 1

2(θ − θ0)

′[∂2 ln(θ0)

∂θ∂θ ′

](θ − θ0)+ Rn(θ),

where

Rn(θ) = 1

2(θ − θ0)

′{[

∂2 ln(θ∗ij )∂θ∂θ ′

]− ∂2 ln(θ0)

∂θ∂θ ′

}(θ − θ0)

and θ∗ij is between θ and θ0. Note that (8.37) implies that, for any sequence (θn) such that θn − θ0 =OP (1), we have Rn(θn) = oP

(‖θn − θ0‖2). In particular, in view of (c), we have Rn

{θJn(Zn)

} =oP (n

−1). Introducing the vector Zn defined by (8.3), we can write

∂ln(θ0)

∂θ ′(θ − θ0) = − 1

nZ′nJn

√n(θ − θ0)

Page 225: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 209

and

ln(θ)− ln(θ0) = − 1

2nZ′nJn

√n(θ − θ0)− 1

2n

√n(θ − θ0)

′JnZn

+1

2(θ − θ0)

′Jn(θ − θ0)+ Rn(θ)+ R∗n(θ)

= 1

2n‖Zn −

√n(θ − θ0)‖2

Jn

− 1

2nZ′nJnZn + Rn(θ)+ R∗n(θ), (8.38)

where

R∗n(θ) ={∂ ln(θ0)

∂θ− ∂ln(θ0)

∂θ

}(θ − θ0)+ 1

2(θ − θ0)

′{∂2 ln(θ0)

∂θ∂θ ′− Jn

}(θ − θ0).

The initial conditions are asymptotically negligible, even when the parameter stays at the boundary.Result (d) of page 160 remaining valid, we have R∗n(θn) = oP

(n−1/2 ‖θn − θ0‖

)for any sequence

(θn) such that θn → θ0 in probability.

(e) Convergence in probability of θn to θ0 at rate n1/2. We know that√n(θn − θ0) = OP (1)

when θ0 ∈◦�. We will show that this result remains valid when θ0 ∈ ∂�. Theorem 7.1 applies.

In view of (d), the almost sure convergence of θn to θ0 and of Jn to the nonsingular matrix J ,we have

Rn(θn) = oP

(∥∥θn − θ0∥∥2)= oP

(∥∥θn − θ0∥∥2Jn

)and

R∗n(θn) = oP

(n−1/2

∥∥θn − θ0∥∥Jn

).

Since ln(·) is minimized at θn, we have

ln(θn)− ln(θ0) = 1

2n

{∥∥Zn −√n(θn − θ0)

∥∥2Jn− ‖Zn‖2

Jn

+oP(∥∥√n(θn − θ0)

∥∥2Jn

)+ oP

(∥∥√n(θn − θ0)∥∥Jn

)}≤ 0.

It follows that ∥∥Zn −√n(θn − θ0)

∥∥2Jn≤ ‖Zn‖2

Jn+ oP

(∥∥√n(θn − θ0)∥∥Jn

)+oP

(∥∥√n(θn − θ0)∥∥2Jn

)≤{‖Zn‖Jn + oP

(∥∥√n(θn − θ0)∥∥Jn

)}2,

where the last inequality follows from ‖Zn‖Jn = OP (1). The triangular inequality then yields

‖√n(θn − θ0)‖Jn ≤ ‖√n(θn − θ0)− Zn‖Jn + ‖Zn‖Jn

≤ 2‖Zn‖Jn + oP

(∥∥√n(θn − θ0)∥∥Jn

).

Thus ‖√n(θn − θ0)‖Jn {1+ oP (1)} ≤ 2‖Zn‖Jn = OP (1).

Page 226: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

210 GARCH MODELS

(f) Approximation of ‖Zn −√n(θn − θ0)‖2Jn

by ‖Zn − λ!n ‖2Jn

. We have

0 ≤ ∥∥Zn −√n(θn − θ0)

∥∥2Jn− ∥∥Zn −

√n(θJn(Zn)− θ0)

∥∥2Jn

= 2n{ln(θn)− ln(θJn(Zn))

}− 2n{(Rn + R∗n)(θn)− (Rn + R∗n)(θJn(Zn))

}≤ −2n

{(Rn + R∗n)(θn)− (Rn + R∗n)(θJn(Zn))

} = oP (1),

where the first line comes from the definition of θJn(Zn), the second line comes from (8.38), andthe inequality in third line follows from the fact that ln(·) is minimized at θn, the final equalityhaving been shown in (d). In view of (8.8), we conclude that∥∥Zn −

√n(θn − θ0)

∥∥2Jn− ∥∥Zn − λ!n

∥∥2Jn= oP (1). (8.39)

(g) Approximation of√n(θn − θ0) by λ!n . The vector λ!n , which is the projection of Zn on ! with

respect to the scalar product < x, y >Jn := x′Jny, is characterized (see Lemma 1.1 in Zarantonello,1971) by

λ!n ∈ !,⟨Zn − λ!n , λ

!n − λ

⟩Jn≥ 0, ∀λ ∈ !

(see Figure 8.1). Since θn ∈ � and ! = lim ↑ √n(�− θ0), we have almost surely√n(θn − θ0) ∈

! for n large enough. The characterization then entails∥∥√n(θn − θ0)− Zn

∥∥2Jn= ∥∥√n(θn − θ0)− λ!n

∥∥2Jn+ ∥∥λ!n − Zn

∥∥2Jn

+2⟨√n(θn − θ0)− λ!n , λ

!n − Zn

⟩Jn

≥ ∥∥√n(θn − θ0)− λ!n

∥∥2Jn+ ∥∥λ!n − Zn

∥∥2Jn.

Using (8.39), this yields∥∥√n(θn − θ0)− λ!n∥∥2Jn≤ ∥∥√n(θn − θ0)− Zn

∥∥2Jn− ∥∥λ!n − Zn

∥∥2Jn= oP (1),

which entails (8.9), and completes the proof.�

Proof of Proposition 8.3The first result is an immediate consequence of Slutsky’s lemma and of the fact that under H0,

√nθ (2)

′n = K

√n(θn − θ0)

L→ Kλ!, K ′ {(κη − 1)KJ−1K ′}−1K

P→ �.

To show (8.23) in the standard case where θ0 ∈◦�, the asymptotic χ2

d2distribution is established

by showing that Rn −Wn = oP (1). This equation does not hold true in our testing problemH0 : θ = θ0, where θ0 is on the boundary of �. Moreover, the asymptotic distribution of Wn

is not χ2d2

. A more direct proof is thus necessary.

Since θ (1)n|2 is a consistent estimator of θ(1)0 > 0, we have, for n large enough, θ (1)n|2 > 0 and

∂ ln(θn|2

)∂θi

= 0 for i = 1, . . . , d1.

Let

K∂ ln(θn|2

)∂θ

= 0,

Page 227: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 211

where K = (Id1 , 0d1×d2 ). We again obtain

∂ ln(θn|2

)∂θ

= K ′ ∂ ln(θn|2

)∂θ(2)

, K = (0d2×d1 , Id2). (8.40)

Since

∂2ln(θ0)

∂θ∂θ ′→ J almost surely, (8.41)

a Taylor expansion shows that

√n∂ ln(θn|2)

∂θ

oP (1)= √n∂ln(θ0)

∂θ+ J

√n(θn|2 − θ0

),

where ac= b means a = b + c. The last d2 components of this vectorial relation yield

√n∂ ln(θn|2)∂θ(2)

oP (1)= √n∂ln(θ0)

∂θ(2)+KJ

√n(θn|2 − θ0

), (8.42)

and the first d1 components yield

0oP (1)= √

n∂ln(θ0)

∂θ(1)+ KJ K ′√n

(θ(1)n|2 − θ

(1)0

),

using (θn|2 − θ0

) = K ′(θ(1)n|2 − θ

(1)0

). (8.43)

We thus have

√n(θ(1)n|2 − θ

(1)0

)oP (1)= − (KJ K ′)−1√

n∂ln(θ0)

∂θ(1). (8.44)

Using successively (8.40), (8.42) and (8.43), we obtain

Rn = n

κη − 1

∂ ln(θn|2)

∂θ(2)′ KJ−1K ′ ∂ ln

(θn|2)

∂θ(2)

oP (1)= n

κη − 1

{∂ln (θ0)

∂θ(2)′ +

(θ(1)′n|2 − θ

(1)′0

)KJK ′

}KJ−1K ′

×{∂ln (θ0)

∂θ(2)+KJ K ′

(θ(1)n|2 − θ

(1)0

)}.

Let

W =(

W1

W2

)∼ N

{0, J =

(J11 J12

J21 J22

)},

where W1 and W2 are vectors of respective sizes d1 and d2, and J11 is of size d1 × d1. Thus

KJK ′ = J21, KJK′ = J12, KJ K

′ = J11, KJ−1K ′ =

(J22 − J21J

−111 J12

)−1,

where the last equality comes from Exercise 6.7. Using (8.44), the asymptotic distribution of Rn

is thus that of (W2 − J21J

−111 W1

)′ (J22 − J21J

−111 J12

)−1 (W2 − J21J

−111 W1

)

Page 228: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

212 GARCH MODELS

which follows a χ2d2

because it is easy to check that

W2 − J21J−111 W1 ∼ N

(0, J22 − J21J

−111 J12

).

We have thus shown (8.23).Now we show (8.24). Using (8.43) and (8.44), several Taylor expansions yield

nln(θn|2

) oP (1)= nln (θ0)+ n∂ln (θ0)

∂θ ′(θn|2 − θ0

)+ n

2

(θn|2 − θ0

)′J(θn|2 − θ0

)oP (1)= nln (θ0)− n

2

∂ln (θ0)

∂θ(1)′(KJ K ′)−1 ∂ln(θ0)

∂θ(1)

andnln(θn) oP (1)= nln (θ0)+ n

∂ln (θ0)

∂θ ′(θn − θ0

)+ n

2

(θn − θ0

)′J(θn − θ0

).

By subtraction,

LnoP (1)= −n

{1

2

∂ln (θ0)

∂θ(1)′(KJ K ′)−1 ∂ln(θ0)

∂θ(1)+ ∂ln (θ0)

∂θ ′(θn − θ0

)+1

2

(θn − θ0

)′J(θn − θ0

)}.

It can be checked that √n

(∂ln(θ0)∂θ

θn − θ0

)L→( −JZ

λ!

).

Thus, the asymptotic distribution of Ln is that of

L = −1

2Z′J ′K ′J−1

11 KJZ + Z′J ′λ! − 1

2λ!

′Jλ!.

Moreover, it can easily be verified that

J ′K ′J−111 KJ = J − (κη − 1)� where (κη − 1)� =

(0 00 J22 − J21J

−111 J12

).

It follows that

L = −1

2Z′JZ + κη − 1

2Z′�Z + Z′J ′λ! − 1

2λ!

′Jλ!

= −1

2(λ! − Z)′J (λ! − Z)+ κη − 1

2Z′�Z,

which gives the first equality of (8.24). The second equality follows using Exercise 8.2. �

Proof of Proposition 8.4Since Cov(νt , σ 2

t ) = 0, we have

1− Var(νt )

Var(ε2t )= Var(σ 2

t )

Var(ε2t )= Var(

∑q

i=1 αiε2t−i )

Var(ε2t )

=q∑i=1

α2i + 2

∑i<j

αiαjCov(ε2

t−i , ε2t−j )

Var(ε2t )

and the result follows from ρε2 (i) ≥ 0 (see Proposition 2.2). �

Page 229: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 213

Proof of Theorem 8.2We first study the asymptotic impact of the unknown initial values on the statistic rm. Introducethe vector rm = (r(1), . . . , r(m))′, where

r(h) = n−1n∑

t=h+1

st st−h, with st = η2t − 1 and 0 < h < n.

Let st (θ) (st (θ)) be the random variable obtained by replacing ηt by ηt (θ) = εt /σt (θ) (ηt (θ) =εt/σt (θ)) in st . The vectors rm(θ) and rm(θ) are defined similarly, so that rm = rm(θ0) andrm = rm(θn). Write a

c= b when a = b + c. Using (7.30) and the arguments used to show (d) onpage 160, it can be shown that, as n→∞,

√n ‖rm − rm(θ0)‖ = oP (1), sup

θ∈�

∥∥∥∥∂rm(θ)∂θ ′− ∂ rm(θ)

∂θ ′

∥∥∥∥ = oP (1). (8.45)

We now show that the asymptotic distribution of√nrm is a function of the joint asymptotic

distribution of√nrm and of the QMLE. By the arguments used to show (c) on page 160, it can

be shown that there exists a neighborhood V(θ0) of θ0 such that

limn→∞E sup

θ∈V(θ0)

∥∥∥∥∂2rm(θ)

∂θi∂θj

∥∥∥∥ <∞ for all i, j ∈ {1, . . . , p + q + 1}. (8.46)

Using (8.45) and the fact that√n(θn − θ0) = OP (1), a Taylor expansion of rm(·) around θn and

θ0 shows that

√nrm = √

nrm(θ0)+ ∂ rm(θ∗)

∂θ ′√n(θn − θ0)

oP (1)= √nrm + ∂rm(θ

∗)∂θ ′

√n(θn − θ0)

for some θ∗ between θn and θ0. Using (8.46), the ergodic theorem, the strong consistency of theQMLE, and a second Taylor expansion, we obtain

∂rm(θ∗)

∂θ ′oP (1)= ∂rm(θ0)

∂θ ′→ Cm :=

c′1...

c′m

,

where

ch = E

{st−h

∂st (θ0)

∂θ

}= −E

{st−h

1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ

}.

For the next to last equality, we use the fact that E {st ∂st−h(θ0)/∂θ} = 0. It follows that

√nrm

oP (1)= √nrm + Cm

√n(θn − θ0). (8.47)

We now derive the asymptotic distribution of√n(rm, θn − θ0). In the proof of Theorem 7.2,

it is shown that

√n(θn − θ0)

oP (1)= −J−1 1√n

n∑t=1

∂θ t (θ0)

L→ N {0, (κη − 1)J−1} (8.48)

Page 230: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

214 GARCH MODELS

as n→∞, where∂

∂θ t (θ0) = −st 1

σ 2t

∂σ 2t (θ0)

∂θ.

Note that rmoP (1)= n−1∑n

t=1 sts t−1:t−m where s t−1:t−m = (st−1, . . . , st−m)′. In view of (8.48), the

central limit theorem applied to the martingale difference{(

∂∂θ ′ t (θ0), sts

′t−1:t−m

)′ ; σ (ηu, u ≤ t)}

shows that

√n

(θn − θ0

rm

)oP (1)= 1√

n

n∑t=1

st

(J−1 1

σ 2t

∂σ2t (θ0)

∂θ

s t−1:t−m

)

L→ N{

0,

((κη − 1)J−1 θnrm

′θnrm

(κη − 1)2Im

)}, (8.49)

where

θnrm= (κη − 1)J−1E

1

σ 2t

∂σ 2t (θ0)

∂θs′t−1:t−m = −(κη − 1)J−1C′m.

Using (8.47) and (8.49) together, we obtain

√nrm

L→ N (0,D) , D = (κη − 1)2Im − (κη − 1)CmJ−1C′m.

We now show that D is invertible. Because the law of η2t is nondegenerate, we have κη > 1.

We thus have to show the invertibility of

(κη − 1)Im − CmJ−1C′m = EVV′, V = s−1:−m + CmJ

−1 1

σ 20

∂σ 20 (θ0)

∂θ.

If the previous matrix is singular then there exists λ = (λ1, . . . , λm)′ such that λ = 0 and

λ′V = λ′s−1:−m + µ′1

σ 20

∂σ 20 (θ0)

∂θ= 0 a.s., (8.50)

with µ = λ′CmJ−1. Note that µ = (µ1, . . . , µp+q+1)

′ = 0. Otherwise λ′s−1:−m = 0 a.s., whichimplies that there exists j ∈ {1, . . . , m} such that s−j is measurable with respect to σ {st , t = −j}.This is impossible because the st are independent and nondegenerate by assumption A3 on page 144(see Exercise 11.3). Denoting by Rt any random variable measurable with respect to σ {ηu, u ≤ t},we have

µ′∂σ 2

0 (θ0)

∂θ= µ2σ

2−1η

2−1 + R−2

and

σ 20 λ′s−1:−m =

(α1σ

2−1η

2−1 + R−2

) (λ1η

2−1 + R−2

) = λ1α1σ2−1η

4−1 + R−2η

2−1 + R−2.

Thus (8.50) entails thatλ1α1σ

2−1η

4−1 + R−2η

2−1 + R−2 = 0 a.s.

Solving this quadratic equation in η2−1 shows that either η2

−1 = R−2, which is impossible byarguments already given, or λ1α1 = 0. Let λ′2:m = (λ2, . . . , λm)

′. If λ1 = 0 then (8.50) implies that

α1λ′2:ms−2:−mη2

−1 = µ2η2−1 + R−2 a.s.

Page 231: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 215

Taking the expectation with respect to σ {ηt , t ≤ −2}, it can be seen that R−2 = α1λ′2:ms−2:−m − µ2

in the previous equality. Thus we have(α1λ

′2:ms−2:−m − µ2

) (η2−1 − 1

) = 0 a.s.

which entails α1 = µ2 = 0, because P{(λ2, . . . , λm)

′s−2:−m = 0}< 1 (see Exercise 8.12). For

GARCH(p, 1) models, it is impossible to have α1 = 0 by assumption A4. The invertibility of D isthus shown in this case. In the general case, we show by induction that (8.50) entails α1 = . . . αp.

It is easy to show that D→ D in probability (and even almost surely) as n→∞. Theconclusion follows. �

8.7 Bibliographical Notes

It is well known that when the parameter is at the boundary of the parameter space, the maximumlikelihood estimator does not necessarily satisfy the first-order conditions and, in general, does notadmit a limiting normal distribution. The technique, employed in particular by Chernoff (1954)and Andrews (1997) in a general framework, involves approximating the quasi-likelihood by aquadratic function, and defining the asymptotic distribution of the QML as that of the projection ofa Gaussian vector on a convex cone. Particular GARCH models are considered by Andrews (1997,1999) and Jordan (2003). The general GARCH(p, q) case is considered by Francq and Zakoıan(2007). A proof of Theorem 8.1, when the moment assumption B7 is replaced by assumption B7′ ofRemark 8.2, can be found in the latter reference. When the nullity of GARCH coefficients is tested,the parameter is at the boundary of the parameter space under the null, and the alternative is one-sided. Numerous works deal with testing problems where, under the null hypothesis, the parameteris at the boundary of the parameter space. Such problems have been considered by Chernoff (1954),Bartholomew (1959), Perlman (1969) and Gourieroux, Holly and Monfort (1982), among manyothers. General one-sided tests have been studied by, for instance, Rogers (1986), Wolak (1989),Silvapulle and Silvapulle (1995) and King and Wu (1997). Papers dealing more specifically withARCH and GARCH models are Lee and King (1993), Hong (1997), Demos and Sentana (1998),Andrews (2001), Hong and Lee (2001), Dufour et al. (2004) and Francq and Zakoıan (2009b).

The portmanteau tests based on the squared residual autocovariances were proposed by McLeodand Li (1983), Li and Mak (1994) and Ling and Li (1997). The results presented here closely followBerkes, Horvath and Kokoszka (2003a). Problems of interest that are not studied in this book arethe tests on the distribution of the iid process (see Horvath, Kokoszka and Teyssiere, 2004; Horvathand Zitikis, 2006).

Concerning the overrepresentation of the GARCH(1, 1) model in financial studies, we mentionStarica (2006). This paper highlights, on a very long S&P 500 series, the poor performance ofthe GARCH(1, 1) in terms of prediction and modeling, and suggests a nonstationary dynamics ofthe returns.

8.8 Exercises

8.1 (Minimization of a distance under a linear constraint)Let J be an n× n invertible matrix, let x0 be a vector of Rn, and let K be a full-rank p × n

matrix, p ≤ n. Solve the problem of the minimization of Q(x) = (x − x0)′J (x − x0) under

the constraint Kx = 0.

8.2 (Minimization of a distance when some components are equal to zero)Let J be an n× n invertible matrix, x0 a vector of Rn and p < n. Minimize Q(x) = (x −x0)

′J (x − x0) under the constraints xi1 = · · · = xip = 0 (xi denoting the ith component ofx, and assuming that 1 ≤ i1 < · · · < ip ≤ n).

Page 232: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

216 GARCH MODELS

8.3 (Lagrangian or method of substitution for optimization with constraints)Compare the solutions (C.13) and (C.14) of the optimization problem of Exercise 8.2, with

J = 2 −1 0−1 2 1

0 1 2

and the constraints

(a) x3 = 0,

(b) x2 = x3 = 0.

8.4 (Minimization of a distance under inequality constraints)Find the minimum of the function

λ = λ1

λ2

λ3

�→ Q(λ) = (λ− Z)′J (λ− Z), J = 2 −1 0−1 2 1

0 1 2

,

under the constraints λ2 ≥ 0 and λ3 ≥ 0, when

(i) Z = (−2, 1, 2)′,

(ii) Z = (−2,−1, 2)′,

(iii) Z = (−2, 1,−2)′,

(iv) Z = (−2,−1,−2)′.

8.5 (Influence of the positivity constraints on the moments of the QMLE)Compute the mean and variance of the vector λ! defined by (8.16). Compare these momentswith the corresponding moments of Z = (Z1, Z2, Z3)

′.

8.6 (Asymptotic distribution of the QMLE of an ARCH in the conditionally homoscedastic case)For an ARCH(q) model, compute the matrix involved in the asymptotic distribution ofthe QMLE in the case where all the α0i are equal to zero.

8.7 (Asymptotic distribution of the QMLE when an ARCH(1) is fitted to a strong white noise)

Let θ = (ω, α) be the QMLE in the ARCH(1) model εt =√ω + αε2

t−1ηt when the true

parameter is equal to (ω0, α0) = (ω0, 0) and when κη := Eη4t . Give an expression for the

asymptotic distribution of√n(θ − θ0) with the aid of

Z = (Z1, Z2)′ ∼ N

{0,

(ω2

0κη −ω0

−ω0 1

)}.

Compute the mean vector and the variance matrix of this asymptotic distribution. Determinethe density of the asymptotic distribution of

√n(ω − ω0). Give an expression for the kurtosis

coefficient of this distribution as function of κη.

8.8 (One-sided and two-sided tests have the same Bahadur slopes)Let X1, . . . , Xn be a sample from the N(θ, 1) distribution. Consider the null hypothesis H0 :θ = 0. Denote by � the N(0, 1) cumulative distribution function. By the Neyman–Pearsonlemma, we know that, for alternatives of the form H1 : θ > 0, the one-sided test of rejectionregion

C ={n−1/2

n∑i=1

Xi >�−1(1− α)

}

Page 233: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

TESTS BASED ON THE LIKELIHOOD 217

is uniformly more powerful than the two-sided test of rejection region

C∗ ={∣∣∣∣∣n−1/2

n∑i=1

Xi

∣∣∣∣∣>�−1(1− α/2)

}

(moreover, C is uniformly more powerful than any other test of level α or less). Althoughwe just have seen that the test C is superior to the test C∗ in finite samples, we will conductan asymptotic comparison of the two tests, using the Bahadur and Pitman approaches.

• The asymptotic Bahadur slope c(θ) is defined as the almost sure limit of −2/n times thelogarithm of the p-value under Pθ , when the limit exists. Compare the Bahadur slopes ofthe two tests.

• In the Pitman approach, we define a local power around θ = 0 as being the power at τ/√n.

Compare the local powers of C and C∗. Compare also the local asymptotic powers of thetwo tests for non-Gaussian samples.

8.9 (The local asymptotic approach cannot distinguish the Wald, score and likelihood ratio tests)Let X1, . . . , Xn be a sample of the N(θ, σ 2) distribution, where θ and σ 2 are unknown.Consider the null hypothesis H0 : θ = 0 against the alternative H1 : θ > 0. Consider thefollowing three tests:

• C1 ={Wn ≥ χ2

1 (1− α)}, where

Wn = nX2n

S2n

(S2n =

1

n

n∑i=1

(Xi − Xn

)2and Xn = 1

n

n∑i=1

Xi

)

is the Wald statistic;

• C2 ={Rn ≥ χ2

1 (1− α)}, where

Rn = nX2n

n−1∑n

i=1 X2i

is the Rao score statistic;

• C3 ={Ln ≥ χ2

1 (1− α)}, where

Ln = n logn−1∑n

i=1 X2i

S2n

is the likelihood ratio statistic.

Give a justification for these three tests. Compare their local asymptotic powers and theirBahadur slopes.

8.10 (The Wald and likelihood ratio statistics have the same asymptotic distribution)Consider the case d2 = 1, that is, the framework of Section 8.3.3 where only one coefficientis equal to zero. Without using Remark 8.3, show that the asymptotic laws W and L definedby (8.22) and (8.24) are such that

W = 2

κη − 1L.

Page 234: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

218 GARCH MODELS

8.11 (For testing conditional homoscedasticity, the Wald and likelihood ratio statistics have thesame asymptotic distribution)Repeat Exercise 8.10 for the conditional homoscedasticity test (8.26) in the ARCH(q) case.

8.12 (The product of two independent random variables is null if and only if one of the two variablesis null)Let X and Y be two independent random variables such that XY = 0 almost surely. Showthat either X = 0 almost surely or Y = 0 almost surely.

Page 235: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

9

Optimal Inference andAlternatives to the QMLE*

The most commonly used estimation method for GARCH models is the QML method studied inChapter 7. One of the attractive features of this method is that the asymptotic properties of theQMLE are valid under mild assumptions. In particular, no moment assumption is required on theobserved process in the pure GARCH case. However, the QML method has several drawbacks,motivating the introduction of alternative approaches. These drawbacks are the following: (i) theestimator is not explicit and requires a numerical optimization algorithm; (ii) the asymptotic nor-mality of the estimator requires the existence of a moment of order 4 for the noise ηt ; (iii) theQMLE is inefficient in general; (iv) the asymptotic normality requires the existence of momentsfor εt in the general ARMA-GARCH case; (v) a complete parametric specification is required.

In the ARCH case, the QLS estimator defined in Section 6.2 addresses point (i) satisfactorily,at the cost of additional moment conditions. The maximum likelihood (ML) estimator studied inSection 9.1 of this chapter provides an answer to points (ii) and (iii), but it requires knowledge ofthe density f of ηt . Indeed, it will be seen that adaptive estimators for the set of all the parametersdo not exist in general semi-parametric GARCH models. Concerning point (iii), it will be seen thatthe QML can sometimes be optimal outside of trivial case where f is Gaussian. In Section 9.2, theML estimator will be studied in the (quite realistic) situation where f is misspecified. It will alsobe seen that the so-called local asymptotic normality (LAN) property allows us to show the localasymptotic optimality of test procedures based on the ML. In Section 9.3, less standard estimatorsare presented, in order to address to some of the points (i)–(v).

In this chapter, we focus on the main principles of the estimation methods and do not give allthe mathematical details. Precise regularity conditions justifying the arguments used can be foundin the references that are given throughout the chapter or in Section 9.4.

9.1 Maximum Likelihood Estimator

In this section, the density f of the strong white noise (ηt ) is assumed known. This assumptionis obviously very strong and the effect of the misspecification of f will be examined in thenext section. Conditionally on the σ -field Ft−1 generated by {εu : u < t}, the variable εt has

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 236: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

220 GARCH MODELS

the density x → σt−1f (x/σt ). It follows that, given the observations ε1, . . . , εn, and the initial

values ε0, . . . , ε1−q , σ 20 , . . . , σ

21−p , the conditional likelihood is defined by

Ln,f (θ) = Ln,f (θ; ε1, . . . , εn) =n∏t=1

1

σtf

(εt

σt

),

where the σ 2t are recursively defined, for t ≥ 1, by

σ 2t = σ 2

t (θ) = ω +q∑i=1

αiε2t−i +

p∑j=1

βj σ2t−j . (9.1)

A maximum likelihood estimator (MLE) is obtained by maximizing the likelihood on a compactsubset �∗ of the parameter space. Such an estimator is denoted by θn,f .

9.1.1 Asymptotic BehaviorUnder the above-mentioned regularity assumptions, the initial conditions are asymptotically negli-gible and, using the ergodic theorem, we have almost surely

logLn,f (θ)

Ln,f (θ0)→ Eθ0 log

σt (θ0)

σt (θ)

f(

εtσt (θ)

)f(

εtσt (θ0)

) ≤ logEθ0

σt (θ0)

σt (θ)

f(

εtσt (θ)

)f(

εtσt (θ0)

) = 0,

using Jensen’s inequality and the fact that

Eθ0

σt (θ0)

σt (θ)

f(

εtσt (θ)

)f(

εtσt (θ0)

)∣∣∣∣∣∣Ft−1

=∫

1

σt (θ)f

(x

σt (θ)

)dx = 1.

Adapting the proof of the consistency of the QMLE, it can be shown that θn,f → θ0 almost surelyas n→∞.

Assuming in particular that θ0 belongs to the interior of the parameter space, a Taylor expan-sion yields

0 = 1√n

∂θlogLn,f (θ0)+ 1

n

∂2

∂θ∂θ ′logLn,f (θ0)

√n(θn,f − θ0

)+ oP (1). (9.2)

We have

∂θlogLn,f (θ0) = −

n∑t=1

1

2σ 2t

{1+ f ′ (ηt )

f (ηt )ηt

}∂σ 2

t

∂θ(θ0) :=

n∑t=1

νt . (9.3)

It is easy to see that (νt ,Ft ) is a martingale difference (using, for instance, the computations ofExercise 9.1). It follows that

Sn,f (θ0) := n−1/2 ∂

∂θlogLn,f (θ0)

L−→ N(0, I), (9.4)

Page 237: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 221

where I is the Fisher information matrix, defined by

I = Eν1ν′1 =

ζf

4E

1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0), ζf =

∫ {1+ f ′(x)

f (x)x

}2

f (x)dx.

Note that ζf is equal to σ 2 times the Fisher information for the scale parameter σ > 0 of thedensities σ−1f (·/σ). When f is the N(0,1) density, we thus have ζf = σ 2 × 2/σ 2 = 2.

We now turn to the other terms of the Taylor expansion (9.2). Let

I(θ) = − 1

n

∂2

∂θ∂θ ′logLn,f (θ).

We have

I(θ0) = I+ oP (1), (9.5)

thus, using the invertibility of I,

n1/2 (θn,f − θ0) L−→ N {0,I−1} . (9.6)

Note that

θn,f = θ0 + I−1Sn,f (θ0)/√n+ oP (n

−1/2). (9.7)

With the previous notation, the QMLE has the asymptotic variance

I−1N := (Eη4

t − 1)

{E

1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0)

}−1

=(∫

x4f (x)dx − 1)ζf

4I−1. (9.8)

The following proposition shows that the QMLE is not only optimal in the Gaussian case.

Proposition 9.1 (Densities ensuring the optimality of the QMLE) Under the previousassumptions, the QMLE has the same asymptotic variance as the MLE if and only if the density ofηt is of the form

f (y) = aa

�(a)exp(−ay2)|y|2a−1, a > 0, �(a) =

∫ ∞

0ta−1 exp(−t)dt. (9.9)

Proof. Given the asymptotic variances of the ML and QML estimators, it suffices to show that

(Eη4t − 1)ζf ≥ 4, (9.10)

with equality if and only if f satisfies (9.9). In view of Exercise 9.2, we have∫(y2 − 1)

(1+ f ′(y)

f (y)y

)f (y)dy = −2.

The Cauchy–Schwarz inequality then entails that

4 ≤∫(y2 − 1)2f (y)dy

∫ (1+ f ′(y)

f (y)y

)2

f (y)dy = (Eη4t − 1)ζf

Page 238: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

222 GARCH MODELS

with equality if and only if there exists a = 0 such that 1+ ηtf′(ηt )/f (ηt ) = −2a

(η2t − 1

)a.s.

This occurs if and only if f ′(y)/f (y) = −2ay + (2a − 1)/y almost everywhere. Under the con-straints f ≥ 0 and

∫f (y)dy = 1, the solution of this differential equation is (9.9). �

Note that when f is of the form (9.9) then we have

logLn,f (θ) = −an∑t=1

(ε2t

σ 2t (θ)

+ log σ 2t (θ)

)up to a constant which does not depend on θ . It follows that in this case the MLE coincides withthe QMLE, which entails the sufficient part of Proposition 9.1.

9.1.2 One-Step Efficient EstimatorFigure 9.1 shows the graph of the family of densities for which the QMLE and MLE coincide(and thus for which the QML is efficient). When the density f does not belong to this family ofdistributions, we have ζf

(∫x4f (x)dx − 1

)> 4, and the QMLE is asymptotically inefficient in the

sense that

Varas√n{θn − θ0

}− Varas√n{θn,f − θ0

} = (Eη41 − 1− 4

ζf

)J−1

is positive definite. Table 9.1 shows that the efficiency loss can be important.An efficient estimator can be obtained from a simple transformation of the QMLE, using the

following result (which is intuitively true by (9.7)).

−3 −2 −1 0 1 2 3

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

a = 1/8a = 1/4a = 1/2a = 1a = 2

Figure 9.1 Density (9.9) for different values of a > 0. When ηt has this density, the QMLE andMLE have the same asymptotic variance.

Page 239: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 223

Table 9.1 Asymptotic relative efficiency (ARE) of the MLE with respect to the QMLE,Varas θn/Varas θn,f , when f (y) = √ν/ν − 2fν(y

√ν/ν − 2), where fν denotes the Student t

density with ν degrees of freedom.

ν 5 6 7 8 9 10 20 30 ∞ARE 5/2 5/3 7/5 14/11 6/5 15/13 95/92 145/143 1

Proposition 9.2 (One-step efficient estimator) Let θn be a preliminary estimator of θ0 such that√n(θn − θ0) = OP (1). Under the previous assumptions, the estimator defined by

θn,f = θn + I−1(θn)Sn,f (θn)/√n

is asymptotically equivalent to the MLE:

√n(θn,f − θ0

) L−→ N {0,I−1} .Proof. A Taylor expansion of Sn,f (·) around θ0 yields

Sn,f (θn) = Sn,f (θ0)− I√n(θn − θ0

)+ oP (1).

We thus have√n(θn,f − θ0

) = √n (θn,f − θn)+√n (θn − θ0

)= I−1Sn,f (θn)+ I−1 {Sn,f (θ0)− Sn,f (θn)

}+ op(1)

= I−1Sn,f (θ0)+ op(1)L−→ N {0,I−1} ,

using (9.4). �

In practice, one can take the QMLE as a preliminary estimator: θ = θn.

Example 9.1 (QMLE and one-step MLE) N = 1000 independent samples of length n = 100and 1000 were simulated for an ARCH(1) model with parameter ω = 0.2 and α = 0.9, where thedistribution of the noise ηt is the standardized Student t given by f (y) = √ν/ν − 2fν(y

√ν/ν − 2)

(fν denoting the Student density with ν degrees of freedom). Table 9.2 summarizes the estimationresults of the QMLE θn and of the efficient estimator θn,f . This table shows that the one-stepestimator θn,f is, for this example, always more accurate than the QMLE. The observed relativeefficiency is close to the theoretical ARE computed in Table 9.1.

9.1.3 Semiparametric Models and Adaptive EstimatorsIn general, the density f of the noise is unknown, but f and f ′ can be estimated from the normal-ized residuals ηt = εt/σt (θn), t = 1, . . . , n (for instance, using a kernel nonparametric estimator).The estimator θn,f (or the one-step estimator θn,f ) can then be utilized. This estimator is said tobe adaptive if it inherits the efficiency property of θn,f for any value of f . In general, it is notpossible to estimate all the GARCH parameters adaptively.

Take the ARCH(1) example {εt = σtηt , ηt ∼ fλσ 2t = ω + αε2

t−1,(9.11)

Page 240: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

224 GARCH MODELS

Table 9.2 QMLE and efficient estimator θn,f , on N = 1000 realizations of the ARCH(1) modelεt = σtηt , σ 2

t = ω + αε2t−1, ω = 0.2, α = 0.9, ηt ∼ f (y) = √ν/ν − 2fν(y

√ν/ν − 2). The last

column gives the estimated ARE obtained from the ratio of the MSE of the two estimators on theN realizations.

QMLE θn θn,f

ν n θ0 Mean RMSE Mean RMSE ARE

5 100 ω = 0.2 0.202 0.0794 0.211 0.0646 1.51α = 0.9 0.861 0.5045 0.857 0.3645 1.92

1000 ω = 0.2 0.201 0.0263 0.201 0.0190 1.91α = 0.9 0.897 0.1894 0.886 0.1160 2.67

6 100 ω = 0.2 0.212 0.0816 0.215 0.0670 1.48α = 0.9 0.837 0.3852 0.845 0.3389 1.29

1000 ω = 0.2 0.202 0.0235 0.202 0.0186 1.61α = 0.9 0.889 0.1384 0.888 0.1060 1.70

20 100 ω = 0.2 0.207 0.0620 0.209 0.0619 1.00α = 0.9 0.847 0.2899 0.845 0.2798 1.07

1000 ω = 0.2 0.199 0.0170 0.199 0.0165 1.06α = 0.9 0.899 0.0905 0.898 0.0885 1.05

For ν = 5, 6 and 20 the theoretical AREs are respectively 2.5, 1.67 and 1.03 (for α and ω).

where ηt has the double Weibull density

fλ(x) = λ

2|x|λ−1 exp

(−|x|λ) , λ> 0.

The subscript 0 is added to signify the true values of the parameters. The parameter ϑ0 = (θ ′0, λ0)′,

where θ0 = (ω0, α0)′, is estimated by maximizing the likelihood of the observations ε1, . . . , εn

conditionally on the initial value ε0. In view of (9.3), the first two components of the score aregiven by

∂θlogLn,fλ(ϑ0) = −

n∑t=1

1

2σ 2t

{1+ f ′λ (ηt )

fλ (ηt )ηt

}∂σ 2

t

∂θ(ϑ0),

with {1+ f ′λ (ηt )

fλ (ηt )ηt

}= λ0

(1− |ηt |λ0

),

∂σ 2t

∂θ=(

1ε2t−1

).

The last component of the score is

∂λlogLn,fλ(ϑ0) =

n∑t=1

{1

λ 0+ (1− |ηt |λ0

)log |ηt |

}.

Note that

Ifλ0= E

{λ0(1− |ηt |λ0

)}2 = λ20,

E

{1

λ 0+ (1− |ηt |λ0

)log |ηt |

}2

= 1− 2γ + γ 2 + π2/3

λ20

E

[{λ0(1− |ηt |λ0

)} {1

λ 0+ (1− |ηt |λ0

)log |ηt |

}]= 1− γ,

Page 241: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 225

where γ = 0.577 . . . is the Euler constant. It follows that the score satisfies

n−1/2 ∂

∂ϑlogLn,fλ(ϑ0)

L−→ N{

0, I =(

I11 I12

I′12 I22

)},

where

I11 = Eλ2

0

4(ω0 + α0ε2t−1)

2

(1 ε2

t−1ε2t−1 ε4

t−1

), I12 = γ − 1

2E

1

ω0 + α0ε2t−1

(1

ε2t−1

),

and I22 = λ−20

(1− 2γ + γ 2 + π2/6

). By the general properties of an information matrix (see

Exercise 9.4 for a direct verification), we also have

n−1 ∂2

∂ϑ∂ϑ ′logLn,fλ(ϑ0)→−I a.s. as n→∞.

The information matrix I being such that I12 = 0, the necessary Stein’s condition (seeBickel, 1982) for the existence of an adaptive estimator is not satisfied. The intuition behind thiscondition is the following. In view of the previous discussion, the asymptotic variance of theMLE of ϑ0 should be of the form

I−1 =(

I11 I12

I21 I22

).

When λ0 is unknown, the optimal asymptotic variance of a regular estimator of θ0 is thus I11.Knowing λ0, the asymptotic variance of the MLE of θ0 is I−1

11 . If there exists an adaptive estimatorfor the class of the densities of the form fλ (or for a larger class of densities), then we have

I11 = I−111 . Since I11 =

(I11 − I12I

−122 I21

)−1(see Exercise 6.7), this is possible only if I12 = 0,

which is not the case here.Reparameterizing the model, Drost and Klaassen (1997) showed that it is, however, possible

to obtain adaptative estimates of certain parameters. To illustrate this point, return to the ARCH(1)example with the parameterization {

εt = cσtηt , ηt ∼ fλσ 2t = 1+ αε2

t−1.(9.12)

Let ϑ = (α, c, λ) be an element of the parameter space. The score now satisfies

∂αlogLn,fλ(ϑ0) = −

n∑t=1

1

2σ 2t

{λ0(1− |ηt |λ0

)}ε2t−1,

∂clogLn,fλ(ϑ0) = −

n∑t=1

1

c0

{λ0(1− |ηt |λ0

)},

∂λlogLn,fλ(ϑ0) =

n∑t=1

{1

λ 0+ (1− |ηt |λ0

)log |ηt |

}.

Thus n−1/2∂ logLn,fλ0(ϑ0)/∂ϑ

L−→ N (0,I) with

I =

λ2

04 A

λ20

2c0B − 1−γ

2 B

λ20

2c0B

λ20c2

0− 1−γ

c0

− 1−γ2 B − 1−γ

c0

1−2γ+γ 2+π2/6λ2

0

,

Page 242: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

226 GARCH MODELS

where

A = Eε4t−1(

1+ α0ε2t−1

)2 , B = Eε2t−1

1+ α0ε2t−1

.

It can be seen that this matrix is invertible because its determinant is equal to π2λ20(A−

B2)/24c20 > 0. Moreover,

I−1 =

4

λ20(A−B2)

− 2c0B

λ20(A−B2)

0

− 2c0B

λ20(A−B2)

c20

[A{π2+6(1−γ )2

}−6B2(1−γ )2

]π2λ2

0(A−B2)

6c0(1−γ )π2

0 6c0(1−γ )π2

6λ20

π2

.

The MLE enjoying optimality properties in general, when λ0 is unknown, the optimal variance ofan estimator of (α0, c0) should be equal to

ML =

4

λ20(A−B2)

− 2c0B

λ20(A−B2)

− 2c0B

λ20(A−B2)

c20

[A{π2+6(1−γ )2

}−6B2(1−γ )2

]π2λ2

0(A−B2)

.

When λ0 is known, a similar calculation shows that the MLE of (α0, c0) should have the asymp-totic variance

ML|λ0 = λ2

04 A

λ20

2c0B

λ20

2c0B

λ20c2

0

−1

= 4

λ20(A−B2)

− 2c0B

λ20(A−B2)

− 2c0B

λ20(A−B2)

c20A

λ20(A−B2)

.

We note that ML|λ0(1, 1) = ML(1, 1). Thus, in presence of the unknown parameter c, the MLEof α0 is equally accurate when λ0 is known or unknown. This is not particular to the chosenform of the density of the noise, which leads us to think that there might exist an estimator ofα0 that adapts to the density f of the noise (in presence of the nuisance parameter c). Drost andKlaassen (1997) showed the actual existence of adaptive estimators for some parameters of anextension of (9.12).

9.1.4 Local Asymptotic NormalityIn this section, we will see that the GARCH model satisfies the so-called LAN property, which

has interesting consequences for the local asymptotic properties of estimators and tests. Let θn =θ + hn/

√n be a sequence of local parameters around the parameter θ ∈ ◦�, where (hn) is a bounded

sequence of Rp+q+1. Consider the local log-likelihood ratio function

hn → !n,f (θn, θ) := logLn,f (θn)

Ln,f (θ).

The Taylor expansion of this function around 0 leads to

!n,f (θ + hn/√n, θ) = h′nSn,f (θ)−

1

2h′nI(θ)hn + oPθ (1), (9.13)

where, as we have already seen,

Sn,f (θ)L−→ N {0,I(θ)} under Pθ . (9.14)

Page 243: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 227

It follows that

!n,f (θ + hn/√n, θ)

oPθ (1)∼ N(−1

2τn, τn

), τn = h′nI(θ)hn.

Remark 9.1 (Limiting Gaussian experiments) Denoting by L(h) the N {h,I−1(θ)}

densityevaluated at the point X, we have

!(h, 0) := logL(h)

L(0)= −1

2(X − h)′I(θ)(X − h)+ 1

2X′IX

= −1

2h′I(θ)h+ h′IX ∼ N

(−1

2h′I(θ)h, h′I(θ)h

)under the null hypothesis that X ∼ N {0, I−1(θ)

}. It follows that the local log-likelihood ratio

!n,f (θ + h/√n, θ) of n observations converges in law to the likelihood ratio !(h, 0) of one

Gaussian observation X ∼ N {h,I−1(θ)}. Using Le Cam’s terminology, we say that the ‘sequence

of local experiments’ {Ln,f (θ + h/√n), h ∈ Rp+q+1} converges to the Gaussian experiment

{N(h,I−1(θ)), h ∈ Rp+q+1}.

The property (9.13)–(9.14) is called LAN. It entails that the MLE is locally asymptoticallyoptimal (in the minimax sense and in various other senses; see van der Vaart, 1998). The LANproperty also makes it very easy to compute the local asymptotic distributions of statistics, or theasymptotic local powers of tests. As an example, consider tests of the null hypothesis

H0 : αq = α0 > 0

against the sequence of local alternatives

Hn : αq = α0 + c/√n.

The performance of the Wald, score and of likelihood ratio tests will be compared.

Wald Test Based on the MLE

Let αq,f be the (q + 1)th component of the MLE θn,f . In view of (9.7) and (9.13)–(9.14), wehave under H0 that( √

n(αq,f − α0)

!n,f (θ0 + h/√n, θ0)

)=(

e′q+1I−1/2X

h′I1/2X − h′Ih2

)′+ op(1), (9.15)

where X ∼ N(0, I1+p+q), and ei denotes the ith vector of the canonical basis of Rp+q+1, noting

that the (q + 1)th component of θ0 ∈◦� is equal to α0. Consequently, the asymptotic distribution

of the vector defined in (9.15) is

N{(

0− h′Ih

2

),

(e′q+1I

−1eq+1 e′q+1h

h′eq+1 h′Ih

)}under H0. (9.16)

Le Cam’s third lemma (see van der Vaart, 1998, p. 90; see also Exercise 9.3 below) and thecontiguity of the probabilities Pθ0 and Pθ0+h/

√n (implied by the LAN property (9.13)–(9.14))

show that, for e′q+1h = c,

√n(αq,f − α0)

L−→ N {c, e′q+1I−1eq+1

}under Hn.

Page 244: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

228 GARCH MODELS

The Wald test (and also the t test) is defined by the rejection region {Wn,f >χ21 (1− α)} where

Wn,f = n(αq,f − α0)2/{e′q+1I

−1(θn,f )eq+1}

and χ21 (1− α) denotes the (1− α)-quantile of a chi-square distribution with 1 degree of freedom.

This test has asymptotic level α and local asymptotic power c �→ 1−�c

{χ2

1 (1− α)}, where �c(·)

denotes the cumulative distribution function of a noncentral chi-square with 1 degree of freedomand noncentrality parameter1

c2

e′q+1I−1eq+1

.

This test is locally asymptotically uniformly most powerful among the asymptoticallyunbiased tests.

Score Test Based on the MLE

The score (or Lagrange multiplier) test is based on the statistic

Rn,f = 1

n

∂ logLn,f (θcn,f )

∂θ ′I−1(θ cn,f )

∂ logLn,f (θcn,f )

∂θ, (9.17)

where θ cn,f is the MLE under H0, that is, constrained by the condition that the (q + 1)th componentof the estimator is equal to α0. By the definition of θ cn,f , we have

∂ logLn,f (θcn,f )

∂θi= 0, i = q + 1. (9.18)

In view of (9.17) and (9.18), the test statistic can be written as

Rn,f = 1

n

{∂ logLn,f (θ

cn,f )

∂θq+1

}2

e′q+1I−1(θ cn,f )eq+1. (9.19)

Under H0, almost surely θ cn,f → θ0 and θn,f → θ0. Consequently,

0 = 1√n

∂ logLn,f (θn,f )

∂θ

oP (1)= 1√n

∂ logLn,f (θ0)

∂θ− I√n(θn,f − θ0)

and1√n

∂ logLn,f (θcn,f )

∂θ

oP (1)= 1√n

∂ logLn,f (θ0)

∂θ− I√n(θ cn,f − θ0).

Taking the difference, we obtain

1√n

∂ logLn,f (θcn,f )

∂θ

oP (1)= I√n(θn,f − θ cn,f ) (9.20)

oP (1)= I(θ cn,f )√n(θn,f − θ cn,f ),

1 By definition, if Z1, . . . , Zk are independently and normally distributed with variance 1 and meansm1, . . . , mk , then

∑ki=1 Z

2i follows a noncentral chi-square with k degrees of freedom and noncentrality

parameter∑k

i=1 m2i .

Page 245: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 229

which, using (9.17), gives

Rn,foP (1)= n(θn,f − θ cn,f )

′I(θ cn,f )(θn,f − θ cn,f ). (9.21)

From (9.20), we obtain√n(θn,f − θ cn,f )

oP (1)= I−1 1√n

∂ logLn,f (θcn,f )

∂θ.

Using this relation, e′q+1θcn,f = α0 and (9.18), it follows that

√n(αn,f − α0)

oP (1)= {e′q+1I

−1(θ0)eq+1} 1√

n

∂ logLn,f (θcn,f )

∂θq+1.

Using (9.19), we have

Rn,foP (1)= n

(αn,f − α0)2

e′q+1I−1(θ cn,f )eq+1

oP (1)= Wn,f under H0. (9.22)

By Le Cam’s third lemma, the score test thus inherits the local asymptotic optimality propertiesof the Wald test.

Likelihood Ratio Test Based on the MLE

The likelihood ratio test is based on the statistic Ln,f = 2!n,f (θn,f , θcn,f ). The Taylor expansion

of the log-likelihood around θn,f leads to

logLn,f (θcn,f )

oP (1)= logLn,f (θn,f )+ (θ cn,f − θn,f )′ ∂ logLn,f (θn,f )

∂θ

+1

2(θ cn,f − θn,f )

′ ∂2 logLn,f (θn,f )

∂θ∂θ ′(θ cn,f − θn,f ),

thus, using ∂ logLn,f (θn,f )/∂θ = 0, (9.5) and (9.21),

Ln,foP (1)= n(θcn,f − θn,f )

′I(θ cn,f − θn,f )oP (1)= Rn,f ,

under H0 and Hn. It follows that the three tests exhibit the same asymptotic behavior, both underthe null hypothesis and under local alternatives.

Tests Based on the QML

We have seen that the Wn,f , Rn,f and Ln,f tests based on the MLE are all asymptotically equivalentunder H0 and Hn (in particular, they are all asymptotically locally optimal). We now compare thesetests to those based on the QMLE, focusing on the QML Wald whose statistic is

Wn = n(αq − α0)2/e′q+1I

−1N (θn)eq+1,

where αq is the (q + 1)th component of the QML θn, and I−1N (θ) is

I−1N (θ) = 1

n

n∑t=1

{ε2t

σ 2t (θ)

− 1

}2{

1

n

n∑t=1

1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ)

}−1

or an asymptotically equivalent estimator.

Page 246: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

230 GARCH MODELS

Remark 9.2 Obviously, IN should not be estimated by the analog of (9.5),

I1 := − 1

n

n∑t=1

∂2 logLn(θn)

∂θ∂θ ′oP (1)= 1

2n

n∑t=1

1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θn), (9.23)

nor by the empirical variance of the (pseudo-)score vector

I2 := 1

n

n∑t=1

∂ logLn(θn)

∂θ

∂ logLn(θn)

∂θ ′

oP (1)= 1

4n

n∑t=1

(1− ε2

t

σ 2t (θn)

)21

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θn), (9.24)

which does not always converge to IN, when the observations are not Gaussian.

Remark 9.3 The score test based on the QML can be defined by means of the statistic

Rn = 1

n

{∂ logLn(θ

cn)

∂θq+1

}2

e′q+1I−1N (θ cn)eq+1,

denoting by θ cn the QML constrained by αq = α0. Similarly, we define the likelihood ratio teststatistic Ln based on the QML. Taylor expansions similar to those previously used show that

WnoP (1)= Rn

oP (1)= Ln under H0 and under Hn.

Recall that

√n(θn − θ0)

oP (1)= 2

{E

1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0)

}−1 −1√n

n∑t=1

1

2σ 2t

{1− ε2

t

σ 2t

}∂σ 2

t

∂θ(θ0)

oP (1)= ζf

2I−1−1√

n

n∑t=1

1

2σ 2t

{1− ε2

t

σ 2t

}∂σ 2

t

∂θ(θ0).

Using (9.13)–(9.14), (9.8) and Exercise 9.2, we obtain

Covas{√

n(θn − θ0),!n,f (θ0 + h/√n, θ0)

}oP (1)= ζf

2I−1E

{(1− η2

t )(1+ ηtf ′(ηt )f (ηt )

)

}Eθ0

{1

4σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′

}h = h

under H0. The previous arguments, in particular Le Cam’s third lemma, show that

√n(αq − α0)

L−→ N{c, e′q+1I

−1N (θ0)eq+1

}under Hn.

The local asymptotic power of the{Wn >χ2

1 (1− α)}

test is thus c �→ 1−�c

{χ2

1 (1− α)}, where

the noncentrality parameter is

c = c2

e′q+1I−1N eq+1

= 4

(∫x4f (x)dx − 1)ζf

× c2

e′q+1I−1eq+1

.

Figure 9.2 displays the local asymptotic powers of the two tests, c �→ 1−�c

{χ2

1 (0.95)}

(solidline) and c �→ 1−�c

{χ2

1 (0.95)}

(dashed line), when f is the normalized Student t density with5 degrees of freedom and when θ0 is such that e′q+1I

−1N eq+1 = 4. Note that the local asymptotic

power of the optimal Wald test is sometimes twice as large as that of score test.

Page 247: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 231

2 4 6 8 10c

0.2

0.4

0.6

0.8

1

Figure 9.2 Local asymptotic power of the optimal Wald test{Wn,f >χ2

1 (0.95)}

(solid line) andof the standard Wald test

{Wn >χ2

1 (0.95)}

(dotted line), when f (y) = √ν/ν − 2fν(y√ν/ν − 2)

and ν = 5.

9.2 Maximum Likelihood Estimator withMisspecified Density

The MLE requires the (unrealistic) assumption that f is known. What happens when f is mis-specified, that is, when we use θn,h with h = f ?

In this section, the usual assumption Eη2t = 1 will be replaced by alternative moment assump-

tions that will be more relevant for the estimators considered. Under some regularity assumptions,the ergodic theorem entails that

θn,h = arg maxθ

Qn(θ), where Qn(θ)→ Q(θ) = Ef logσt (θ0)

σt (θ)h

(ηtσt (θ0)

σt (θ)

)a.s.

Here, the subscript f is added to the expectation symbol in order to emphasize the fact thatthe random variable η0 follows the distribution f , which does not necessarily coincide with the‘instrumental’ density h. This allows us to show that

θn,h → θ∗ = arg maxθ

Q(θ) a.s.

Note that the estimator θn,h can be seen as a non-Gaussian QMLE.

9.2.1 Condition for the Convergence of θn,h to θ0

Note that under suitable identifiability conditions, σt (θ0)/σt (θ) = 1 if and only if θ = θ0. Forthe consistency of the estimator (that is, for θ∗ = θ0), it is thus necessary for the function σ →Ef g(η0, σ ), where g(x, σ ) = log σh(xσ), to have a unique maximum at 1:

Ef g(η0, σ ) < Ef g(η0, 1) ∀σ > 0, σ = 1. (9.25)

Remark 9.4 (Interpretation of the condition) If the distribution of X has a density f , and if hdenotes any density, the quantity−2Ef logh(X) is sometimes called the Kullback–Leibler contrastof h with respect to f . The Jensen inequality shows that the contrast is minimal for h = f . Notethat hσ (x) = σh(xσ) is the density of Y/σ , where Y has density h. The condition thus signifiesthat the density h minimizes the Kullback–Leibler contrast of any density of the family hσ , σ > 0,with respect to the density f . In other words, the condition says that it is impossible to get closerto f by scaling h.

Page 248: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

232 GARCH MODELS

It is sometimes useful to replace condition (9.25) by one of its consequences that is easier tohandle. Assume the existence of

g1(x, σ ) = ∂g(x, σ )

∂σ= 1

σ+ h′(σx)

h(σx)x.

If there exists a neighborhood V (1) of 1 such that Ef supσ∈V (1) |g1(η0, σ )| <∞, the dominatedconvergence theorem shows that (9.25) implies the moment condition∫

h′(x)h(x)

xf (x)dx = −1. (9.26)

Obviously, condition (9.25) is satisfied for the ML,that is, when h = f (see Exercise 9.5), andalso for the QML, as the following example shows.

Example 9.2 (QML) When h is the N(0, 1) density, the estimator θn,h corresponds to the QMLE.In this case, if Eη2

t = 1, we have Ef g(η0, σ ) = −σ 2/2+ log σ − log√

2π , and this functionpossesses a unique maximum at σ = 1. We recall the fact that the QMLE is consistent even whenf is not the N(0, 1) density.

The following example shows that for condition (9.25) to be satisfied it is sometimes necessary toreparameterize the model and to change the identifiability constraint Eη2 = 1.

Example 9.3 (Laplace QML) Consider the Laplace density

h(y) = 1

2exp (−|y|) , λ> 0.

Then Ef g(η0, σ ) = −σE|η0| + log σ − log 2. This function possesses a unique maximum at σ =1/E|η0|. In order to have consistency for a large class of density f , it thus suffices to replaceto usual constraint Eη2

t = 1 in the GARCH model εt = σtηt by the new identifiability constraintE|ηt | = 1. Of course σ 2

t no longer corresponds to the conditional variance, but to the conditionalmoment σt = E (|εt | | εu, u < t).

The previous examples show that a particular choice of hcorresponds to a natural identifiabilityconstraint. This constraint applies to a moment of ηt (Eη2

t = 1 when h is N(0, 1), and E|ηt | = 1when h is Laplace). Table 9.3 gives the natural identifiability constraints associated with var-ious choices of h. When these natural identifiability constraints are imposed on the GARCHmodel, the estimator θn,h can be interpreted as a non-Gaussian QMLE, and converges to θ0, evenwhen h = f .

9.2.2 Reparameterization Implying the Convergence of θn,h to θ0

The following examples show that the estimator θn,h based on the misspecified density h = f

generally converges to a value θ∗ = θ0 when the model is not written with the natural identifiabil-ity constraint.

Example 9.4 (Laplace QML for a usual GARCH model) Take h to be the Laplace densityand assume that the GARCH model is of the form{

εt = σtηtσ 2t = ω0 +

∑q

i=1 α0iε2t−i +

∑p

j=1 β0j σ2t−j ,

with the usual constraint Eη2t = 1. The estimator θn,h does not converge to the parameter

θ0 = (ω0, α01, . . . , α0q, β01, . . . , β0p)′.

Page 249: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 233

Table 9.3 Identifiability constraint under which θn,h is consistent.

Law Instrumental density h Constraint

Gaussian 1√2πσ

exp{− (x−m)2

2σ2

}Eη2

t

σ 2 − mEηtσ 2 = 1

Double gamma bp

2�(p) |x|p−1 exp {−b|x|} , b, p> 0 E|ηt | = p

b

Laplace 12 exp {−|x|} E|ηt | = 1

Gamma bp

�(p)|x|p−1 exp {−b|x|} 1l(0,∞)(x) Eηt = p

b

Double inverse gamma bp

2�(p) |x|−p−1 exp {−b/|x|} E 1|ηt | =

p

b

Double inverse χ2 (σ2ν/2)ν/2

2�(ν/2) |x|−ν/2−1 exp{−νσ 2/2|x|} E 1

|ηt | = 1σ2

Double Weibull λ2 |x|λ−1 exp

(−|x|λ) , λ> 0 E|ηt |λ = 1

Gaussian generalized λ1−1/λ

2�(1/λ) exp(−|x|λ/λ) E|ηt |λ = 1

Inverse Weibull λ2 |x|−λ−1 exp

(−|x|−λ) , λ> 0 E|ηt |−λ = 1

Double log-normal 12|x|√2πσ

exp{− (log |x|−m)2

2σ 2

}E log |ηt | = m

The model can, however, always be rewritten as{εt = σ ∗t η∗tσ ∗2t = ω∗ +∑q

i=1 α∗i ε

2t−i +

∑p

j=1 β∗j σ

∗2t−j ,

with η∗t = ηt/$, σ ∗t = $σt and $ = ∫ |x|f (x)dx. Since E|η∗t | = 1, the estimator θn,h converges to

θ∗ = ($2ω0, $2α01, . . . , $

2α0q, β01, . . . , β0p)′.

Example 9.5 (QMLE of GARCH under the constraint E|ηt | = 1) Assume now that theGARCH model is of the form{

εt = σtηtσ 2t = ω0 +

∑q

i=1 α0iε2t−i +

∑p

j=1 β0j σ2t−j ,

with the constraint E|ηt | = 1. If h is the Laplace density, the estimator θn,h converges to theparameter θ0, regardless of the density f of ηt (satisfying mild regularity conditions). The modelcan be written as {

εt = σ ∗t η∗tσ ∗2t = ω∗ +∑q

i=1 α∗i ε

2t−i +

∑p

j=1 β∗j σ

∗2t−j ,

with the usual constraint Eη∗2t = 1 when η∗t = ηt/$, σ ∗t = $σt and $ =

√∫x2f (x)dx. The stan-

dard QMLE does not converge to θ0, but to θ∗ = ($2ω0, $2α01, . . . , $

2α0q, β01, . . . , β0p)′.

9.2.3 Choice of Instrumental Density h

We have seen that, for any fixed h, there exists an identifiability constraint implying the convergenceof θn,h to θ0 (see Table 9.3). In practice, we choose not the parameterization for which θn,hconverges but the estimator that guarantees a consistent estimation of the model of interest. Theinstrumental function h is chosen to estimate the model under a given constraint, corresponding

Page 250: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

234 GARCH MODELS

to a given problem. As an example, suppose that we wish to estimate the conditional momentEt−1

(ε4t

):= E

(ε4t | εu, u < t

)of a GARCH(p, q) process. It will be convenient to consider the

parameterization εt = σtηt under the constraint Eη4t = 1. The volatility σt will then be directly

related to the conditional moment of interest, by the relation σ 4t = Et−1

(ε4t

). In this particular

case, the Gaussian QMLE is inconsistent (because, in particular, the QMLE of αi converges toαiEη

2t ). In view of (9.26), to find relevant instrumental functions h, one can solve

1+ h′(x)h(x)

x = λ− λx4, λ = 0,

since E(λ− λη4t ) = 0 and E

{1+ h′(x)/h(ηt )ηt

} = 0. The densities that solve this differentialequation are of the form

h(x) = c|x|λ−1 exp(−λx4/4

), λ> 0.

For λ = 1 we obtain the double Weibull, and for λ = 4 a generalized Gaussian, which is inaccordance with the results given in Table 9.3.

For the more general problem of estimating conditional moments of |εt | or log |εt |, Table 9.4gives the parameterization (that is, the moment constraint on ηt ) and the type of estimator (that isthe choice of h) for the solution to be only a function of the volatility σt (a solution which is thusindependent of the distribution f of ηt ). It is easy to see that for the instrumental function h ofTable 9.4, the estimator θn,h depends only on r and not on c and λ. Indeed, taking the case r > 0,up to some constant we have

logLn,h(θ) = −λr

n∑t=1

(log σ rt (θ)+

∣∣∣∣ εt

σt (θ)

∣∣∣∣r) ,which shows that θn,h does not depend on c and λ. In practice, one can thus choose the simplestconstants in the instrumental function, for instance c = λ = 1.

9.2.4 Asymptotic Distribution of θn,h

Using arguments similar to those of Section 7.4, a Taylor expansion shows that, under (9.25),

0 = 1√n

∂θlogLn,h(θn,h)

= 1√n

∂θlogLn,h(θ0)+ 1

n

∂2

∂θ∂θ ′logLn,h(θ0)

√n(θn,h − θ0)+ oP (1),

where1√n

∂θlogLn,h(θ) = 1√

n

n∑t=1

∂θlog

1

σt (θ)h

(σt (θ0)

σt (θ)ηt

)

= 1√n

n∑t=1

g1

(ηt ,

σt (θ0)

σt (θ)

) −σt (θ0)

2σ 3t (θ)

∂σ 2t (θ)

∂θ

Table 9.4 Choice of h as function of the prediction problem.

Problem Constraint Solution Instrumental density h

Et−1 |εt |r , r > 0 E |ηt |r = 1 σ rt c|x|λ−1 exp (−λ|x|r/r) , λ> 0Et−1 |εt |−r E |ηt |−r = 1 σ−rt c|x|−λ−1 exp

(−λ|x|−r/r)Et−1 log |εt | E log |ηt | = 0 log σt

√λ/π |2x|−1 exp

{−λ(log |x|)2}

Page 251: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 235

and1

n

∂2

∂θ∂θ ′logLn,h(θ0) = 1

n

n∑t=1

g2 (ηt , 1)1

4σ 4t (θ0)

∂σ 2t (θ0)

∂θ

∂σ 2t (θ0)

∂θ ′+ oP (1).

The ergodic theorem and the CLT for martingale increments (see Section A.2) then entail that

√n(θn,h − θ0

) = 2

Eg2 (η0, 1)J−1 1√

n

n∑t=1

g1 (ηt , 1)1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ+ op(1)

L→ N(0, 4τ 2h,f J

−1) (9.27)

where

J = E1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0) and τ 2

h,f =Ef g

21(η0, 1){

Ef g2(η0, 1)}2 , (9.28)

with g1(x, σ ) = ∂g(x, σ )/∂σ and g2(x, σ ) = ∂g1(x, σ )/∂σ .

Example 9.6 (Asymptotic distribution of the MLE) When h = f (that is, for the MLE), wehave g(x, σ ) = log σf (xσ) and g1(x, σ ) = σ−1 + xf ′(xσ )/f (xσ). Thus Ef g

21(η0, 1) = ζf , as

defined on page 221. This Fisher information can also be expressed as ζf = −Ef g2(η0, 1). Thisshows that τ 2

f,f = ζ−1f , and we obtain (9.6).

Example 9.7 (Asymptotic distribution of the QMLE) If we choose for h the density φ(x) =(2π)−1/2 exp(−x2/2), we have g(x, σ ) = log σ − σ 2x2/2− log

√2π , g1(x, σ ) = σ−1 − σx2 and

g2(x, σ ) = −σ−2 − x2. Thus Ef g21(η0, 1) = (κη − 1), with κη =

∫x4f (x)dx, and Ef g2(η0, 1) =

−2. Therefore τ 2φ,f = (κη − 1)/4, and we obtain the usual expression for the asymptotic variance

of the QMLE.

Example 9.8 (Asymptotic distribution of the Laplace QMLE) Write the GARCH model asεt = σtηt , with the constraint Ef |ηt | = 1. Let (x) = 1

2 exp (−|x|) be the Laplace density. Forh = we have g(x, σ ) = log σ − σ |x| − log 2, g1(x, σ ) = σ−1 − |x| and g2(x, σ ) = −σ−2. Wethus have τ 2

,f = Ef η2t − 1.

Table 9.5 completes Table 9.1. Using the previous examples, this table gives the ARE of theQMLE and Laplace QMLE with respect to the MLE, in the case where f follows the Student tdistribution. The table does not allow us to obtain the ARE of the QMLE with respect to LaplaceQMLE, because the noise ηt has a different normalization with the standard QMLE or the LaplaceQMLE (in other words, the two estimators do not converge to the same parameter).

Table 9.5 Asymptotic relative efficiency of the MLE with respect to the QMLE and to theLaplace QMLE: τ 2

φ,f /τ2f,f and τ 2

,f /τ2f,f , where φ denotes the N(0, 1) density, and (x) =

12 exp (−|x|) the Laplace density. For the QMLE, the Student t density fν with ν degrees of free-dom is normalized so that Eη2

t = 1, that is, the density of ηt is f (y) = √ν/ν − 2fν(y√ν/ν − 2).

For the Laplace QMLE, ηt has the density f (y) = E|tν|fν(yE|tν |), so that E|ηt | = 1.

τ 2h,f /τ

2f,f ν

5 6 7 8 9 10 20 30 100

MLE – QMLE 2.5 1.667 1.4 1.273 1.2 1.154 1.033 1.014 1.001MLE – Laplace 1.063 1.037 1.029 1.028 1.030 1.034 1.070 1.089 1.124

Page 252: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

236 GARCH MODELS

9.3 Alternative Estimation Methods

The estimation methods presented in this section are less popular among practitioners than theQML and LS methods, but each has specific features of interest.

9.3.1 Weighted LSE for the ARMA ParametersConsider the estimation of the ARMA part of the ARMA(P,Q)-GARCH(p, q) model

Xt − c0 =P∑i=1

a0i (Xt−i − c0)+ et −Q∑j=1

b0j et−j

et =√htηt

ht = ω0 +q∑i=1

α0ie2t−i +

p∑j=1

β0jht−j ,

(9.29)

where (ηt ) is an iid(0,1) sequence and the coefficients ω0, α0i and β0j satisfy the usual positivityconstraints. The orders P,Q,p and q are assumed known. The parameter vector is

ϑ = (c, a1, . . . aP , b1, . . . , bQ)′,

the true value of which is denoted by ϑ0, and the parameter space � ⊂ RP+Q+1. Given observationsX1, . . . , Xn and initial values, the sequence εt is defined recursively by (7.22). The weighted LSEis defined as a measurable solution of

ϑn = arg minϑ

n−1n∑t=1

ω2t ε

2t (ϑ),

where the weights ωt are known positive measurable functions of Xt−1, Xt−2, . . . . One can take,for instance,

ω−1t = 1+

t−1∑k=1

k−1−1/s |Xt−k |

with E|X1|2s <∞ and s ∈ (0, 1). It can be shown that there exist constants K > 0 and ρ ∈ (0, 1)such that

|εt | ≤ K (1+ |ηt |)(

1+t−1∑k=1

ρk |Xt−k |)

and

∣∣∣∣ ∂εt∂ϑi

∣∣∣∣ ≤ K

t−1∑k=1

ρk |Xt−k |.

This entails that

|ωt εt | ≤ K (1+ |ηt |)(

1+∞∑k=1

k1+1/sρk

),

∣∣∣∣ωt ∂εt∂ϑi

∣∣∣∣ ≤ K

(1+

∞∑k=1

k1+1/sρk

).

Thus

E

∣∣∣∣ω2t εt

∂εt

∂ϑi

∣∣∣∣2 ≤ K4E (1+ |η1|)2( ∞∑k=1

k1+1/sρk

)4

<∞,

which implies a finite variance for the score vector ω2t εt ∂εt /∂ϑ . Ling (2005) deduces the asymptotic

normality of√n(ϑn − ϑ0), even in the case EX2

1 = ∞.

Page 253: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 237

9.3.2 Self-Weighted QMLERecall that, for the ARMA-GARCH models, the asymptotic normality of the QMLE has beenestablished under the condition EX4

1 <∞ (see Theorem 7.5). To obtain an asymptotically normalestimator of the parameter ϕ0 = (ϑ ′0, θ

′0)′ of the ARMA-GARCH model (9.29) with weaker moment

assumptions on the observed process, Ling (2007) proposed a self-weighted QMLE of the form

ϕn = arg minϕ∈�

n−1n∑t=1

ωt t (ϕ),

where t (ϕ) = ε2t (ϑ)/σ

2t (ϕ)+ log σ 2

t (ϕ), using standard notation. To understand the principleof this estimator, note that the minimized criterion converges to the limit criterion l(ϕ) =Eϕωt t (ϕ) satisfying

l(ϕ)− l(ϕ0) = Eϕ0ωt

{log

σ 2t (ϕ)

σ 2t (ϕ0)

+ σ 2t (ϕ0)

σ 2t (ϕ)

− 1

}+ Eϕ0ωt

{εt (ϑ)− εt (ϑ0)}2σ 2t (ϕ)

+Eϕ0ωt2ηtσt (ϕ0){εt (ϑ)− εt (ϑ0)}

σ 2t (ϕ)

.

The last expectation (when it exists) is null, because ηt is centered and independent of the othervariables. The inequality x − 1 ≥ log x entails that

Eϕ0ωt

{log

σ 2t (ϕ)

σ 2t (ϕ0)

+ σ 2t (ϕ0)

σ 2t (ϕ)

− 1

}≥ Eϕ0ωt

{log

σ 2t (ϕ)

σ 2t (ϕ0)

+ logσ 2t (ϕ0)

σ 2t (ϕ)

}= 0.

Thus, under the usual identifiability conditions, we have l(ϕ) ≥ l(ϕ0), with equality if and onlyif ϕ = ϕ0. Note that the orthogonality between ηt and the weights ωt is essential. Ling (2007)showed the convergence and asymptotic normality of ϕn under the assumption E|X1|s <∞ forsome s > 0.

9.3.3 Lp Estimators

The previous weighted estimator requires the assumption Eη41 <∞. Practitioners often claim that

financial series admit few moments. A GARCH process with infinite variance is obtained eitherby taking large values of the parameters, or by taking an infinite variance for ηt . Indeed, for aGARCH(1, 1) process, each of the two sets of assumptions

(i) α01 + β01 ≥ 1, Eη21 = 1,

(ii) Eη21 = ∞

implies an infinite variance for εt . Under (i), and strict stationarity, the asymptotic distribution ofthe QMLE is generally Gaussian (see Section 7.1.1), whereas the usual estimators have nonstandardasymptotic distributions under (ii) (see Berkes and Horvath, 2003b; Hall and Yao, 2003; Mikoschand Straumann, 2002), which causes difficulties for inference. As an alternative to the QMLE, itis thus interesting to define estimators having an asymptotic normal distribution under (ii), or evenin the more general situation where both α01 + β01 > 1 and Eη2

1 = ∞ are allowed. A GARCHmodel is usually defined under the normalization constraint Eη2

1 = 1. When the assumption thatEη2

1 exists is relaxed, the GARCH coefficients can be identified by imposing, for instance, that themedian of η2

1 is τ = 1. In the framework of ARCH(q) models, Horvath and Liese (2004) considerLp estimators, including the L1 estimator

arg minθ

n−1n∑t=1

ωt

∣∣∣∣∣ε2t − ω −

q∑i=1

αiε2t−1

∣∣∣∣∣ ,

Page 254: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

238 GARCH MODELS

where, for instance, ω−1t = 1+∑p

i=1 ε2t−i + ε4

t−i . When η2t admits a density, continuous and

positive around its median τ = 1, the consistency and asymptotic normality of these estimators areshown in Horvath and Liese (2004), without any moment assumption.

9.3.4 Least Absolute Value EstimationFor ARCH and GARCH models, Peng and Yao (2003) studied several least absolute deviationsestimators. An interesting specification is

arg minθ

n−1n∑t=1

∣∣log ε2t − log σ 2

t (θ)∣∣ . (9.30)

With this estimator it is convenient to define the GARCH parameters under the constraint thatthe median of η2

1 is 1. A reparameterization of the standard GARCH models is thus necessary.Consider, for instance, a GARCH(1, 1) with parameters ω, α1 and β1, and a Gaussian noise ηt .Since the median of η2

1 is τ = 0.4549 . . ., the median of the square of η∗t = ηt/√τ is 1, and the

model is rewritten asεt = σtη

∗t , σ 2

t = τω + τα1ε2t−1 + β1σ

2t−1.

It is interesting to note that the error terms log η∗2t = log ε2

t − log σ 2t (θ) are iid with median 0 when

θ = θ0. Intuitively, this is the reason why it is pointless to introduce weights in the sum (9.30).Under the moment assumption Eε2

1 <∞ and some regularity assumptions, Peng and Yao (2003)show that there exists a local solution of (9.30) which is weakly consistent and asymptoticallynormal, with rate of convergence n1/2. This convergence holds even when the distribution of theerrors has a fat tail: only the moment condition Eη2

t = 1 is required.

9.3.5 Whittle EstimatorIn Chapter 2 we have seen that, under the condition that the fourth-order moments exist, the squareof a GARCH(p, q) satisfies the ARMA(max(p, q), q) representation

φθ0(L)ε2t = ω0 + ψθ0 (L)ut , (9.31)

where

φθ0(z) = 1−max(p,q)∑

i=1

(α0i + β0i )zi, ψθ0(z) = 1−

p∑i=1

β0izi, ut = (η2

t − 1)σ 2t .

The spectral density of ε2t is

fθ0 (λ) =σ 2u

∣∣ψθ0(e−iλ)

∣∣2∣∣φθ0(e−iλ)

∣∣2 , σ 2u = Eu2

t .

Let γε2(h) be the empirical autocovariance of ε2t at lag h. At Fourier frequencies λj = 2πj/n ∈

(−π, π], the periodogram

In(λj ) =∑|h|<n

γε2(h)e−ihλj , j ∈ J ={[−n

2

]+ 1, . . . ,

[n2

]},

can be considered as a nonparametric estimator of 2πfθ0 (λj ). Let

ut (θ) = φθ (L)

ψθ (L)

{ε2t − ωφ−1

θ (1)}.

Page 255: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 239

It can be shown that

σ 2u (θ) := Eu2

t (θ) =σ 2u (θ0)

∫ π

−π

fθ0(λ)

fθ (λ)dλ ≥ σ 2

u (θ0),

with equality if and only if θ = θ0 (see Proposition 10.8.1 in Brockwell and Davis, 1991). In viewof this inequality, it is natural to consider the so-called Whittle estimator

arg minθ

1

n

∑j∈J

In(λj )

fθ (λj ).

For ARMA with iid innovations, the Whittle estimator has the same asymptotic behavior as theQMLE and LSE (which coincide in that case). For GARCH models, the Whittle estimator stillexhibits the same asymptotic behavior as the LSE, but it is generally less accurate than the QMLE.Moreover, Giraitis and Robinson (2001), Mikosch and Straumann (2002) and Straumann (2005)have shown that the consistency requires the existence of Eε4

t , and that the asymptotic normalityrequires Eε8

t <∞.

9.4 Bibliographical Notes

The central reference of Sections 9.1 and 9.2 is Berkes and Horvath (2004), who give preciseconditions for the consistency and asymptotic normality of the estimators θn,h. Slightly differentconditions implying consistency and asymptotic normality of the MLE can be found in Francqand Zakoıan (2006b). Additional results, in particular concerning the interesting situation wherethe density f of the iid noise is known up to a nuisance parameter, are available in Straumann(2005). The adaptative estimation of the GARCH models is studied in Drost and Klaassen (1997)and also in Engle and Gonzalez-Rivera (1991), Linton (1993), Gonzalez-Rivera and Drost (1999)and Ling and McAleer (2003b). Drost and Klaassen (1997), Drost, Klaassen and Werker (1997),Ling and McAleer (2003a) and Lee and Taniguchi (2005) give mild regularity conditions ensuringthe LAN property of GARCH.

Several estimation methods for GARCH models have not been discussed here, among themBayesian methods (see Geweke, 1989), the generalized method of moments (see Rich, Raymondand Butler, 1991), variance targeting (see Francq, Horvath and Zakoıan, 2009) and robust methods(see Muler and Yohai, 2008). Rank-based estimators for GARCH coefficients (except the intercept)were recently proposed by Andrews (2009). These estimators are shown to be asymptoticallynormal under assumptions which do not include the existence of a finite fourth moment for theiid noise.

9.5 Exercises

9.1 (The score of a scale parameter is centered)Show that if f is a differentiable density such that

∫ |x|f (x)dx <∞, then∫ {1+ f ′ (x/σt )

f (x/σt )

x

σt

}1

σtf

(x

σt

)dx = 0.

Deduce that the score vector defined by (9.3) is centered.

9.2 (Covariance between the square and the score of the scale parameter)Show that if f is a differentiable density such that

∫ |x|3f (x)dx <∞, then∫(1− x2)

(1+ x

f ′(x)f (x)

)f (x)dx = 2.

Page 256: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

240 GARCH MODELS

9.3 (Intuition behind Le Cam’s third lemma)Let φθ (x) = (2πσ 2)−1/2 exp(−(x − θ)2/2σ 2) be the N(θ, σ 2) density and let thelog-likelihood ratio

!(θ, θ0, x) = logφθ(x)

φθ0(x).

Determine the distribution of (aX + b

!(θ, θ0, X)

)when X ∼ N(θ0, σ

2), and then when X ∼ N(θ, σ 2).

9.4 (Fisher information)For the parametrization (9.11) on page 223, verify that

−n−1 ∂2

∂ϑ∂ϑ ′logLn,fλ(ϑ0)→ I a.s. as n→∞.

9.5 (Condition for the consistency of the MLE)Let η be a random variable with density f such that E|η|r <∞ for some r = 0. Show that

E logσf (ησ) < E log f (η) ∀σ = 1.

9.6 (Case where the Laplace QMLE is optimal)Consider a GARCH model whose noise has the �(b, b) distribution with density

fb(x) = bb

2�(b)|x|b−1 exp (−b|x|) , �(b) =

∫ ∞

0xb−1 exp (−x) dx,

where b> 0. Show that the Laplace QMLE is optimal.

9.7 (Comparison of the MLE, QMLE and Laplace QMLE)Give a table similar to Table 9.5, but replace the Student t distribution fν by the double�(b, p) distribution

fb,p(x) = bp

2�(p)|x|p−1 exp (−b|x|) , �(p) =

∫ ∞

0xp−1 exp (−x) dx,

where b, p > 0.

9.8 (Asymptotic comparison of the estimators θn,h)Compute the coefficient τ 2

h,f defined by (9.28) for each of the instrumental densities h ofTable 9.4. Compare the asymptotic behavior of the estimators θn,h.

9.9 (Fisher information at a pseudo-true value)Consider a GARCH(p, q) model with parameter

θ0 = (ω0, α01, . . . , α0q , β01, . . . , β0p)′.

1. Give an example of an estimator which does not converge to θ0, but which converges toa vector of the form

θ∗ = ($2ω0, $2α01, . . . , $

2α0q, β01, . . . , β0p)′,

where $ is a constant.

2. What is the relationship between σ 2t (θ0) and σ 2

t (θ∗)?

Page 257: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

OPTIMAL INFERENCE AND ALTERNATIVES TO THE QMLE* 241

3. Let !$ = diag($−2Iq+1, Ip) and

J (θ) = E1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ).

Give an expression for J (θ∗) as a function of J (θ0) and !$.

9.10 (Asymptotic distribution of the Laplace QMLE)Determine the asymptotic distribution of the Laplace QMLE when the GARCH model doesnot satisfy the natural identifiability constraint E|ηt | = 1, but the usual constraint Eη2

t = 1.

Page 258: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 259: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Part III

Extensions and Applications

Page 260: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 261: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

10

Asymmetries

Classical GARCH models, studied in Parts I and II, rely on modeling the conditional variance asa linear function of the squared past innovations. The merits of this specification are its abilityto reproduce several important characteristics of financial time series – succession of quiet andturbulent periods, autocorrelation of the squares but absence of autocorrelation of the returns,leptokurticity of the marginal distributions – and the fact that it is sufficiently simple to allow foran extended study of the probability and statistical properties.

From an empirical point of view, however, the classical GARCH modeling has an importantdrawback. Indeed, by construction, the conditional variance only depends on the modulus of thepast variables: past positive and negative innovations have the same effect on the current volatility.This property is in contradiction to many empirical studies on series of stocks, showing a negativecorrelation between the squared current innovation and the past innovations: if the conditionaldistribution were symmetric in the past variables, such a correlation would be equal to zero.However, conditional asymmetry is a stylized fact: the volatility increase due to a price decreaseis generally stronger than that resulting from a price increase of the same magnitude.

The symmetry property of standard GARCH models has the following interpretation in termsof autocorrelations. If the law of ηt is symmetric, and under the assumption that the GARCHprocess is second-order stationary, we have

Cov(σt , εt−h) = 0, h> 0, (10.1)

because σt is an even function of the εt−i , i > 0 (see Exercise 10.1). Introducing the positive andnegative components of εt ,

ε+t = max(εt , 0), ε−t = min(εt , 0),

it is easily seen that (10.1) holds if and only if

Cov(ε+t , εt−h) = Cov(ε−t , εt−h) = 0, h> 0. (10.2)

This characterization of the symmetry property in terms of autocovariances can be easily testedempirically, and is often rejected on financial series. As an example, for the log-returns series(εt = log(pt/pt−1)) of the CAC 40 index presented in Chapter 1, we get the results shown inTable 10.1.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 262: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

246 GARCH MODELS

Table 10.1 Empirical autocorrelations (CAC 40 series, period 1988–1998).

h 1 2 3 4 5 10 20 40

ρ(εt , εt−h) 0.030 0.005 −0.032 0.028 −0.046* 0.016 0.003 −0.019ρ(|εt |, |εt−h|) 0.090* 0.100* 0.118* 0.099* 0.086* 0.118* 0.055* 0.032ρ(ε+t , εt−h) 0.011 −0.094* −0.148* −0.018 −0.127* −0.039* −0.026 −0.064*

∗indicate autocorrelations which are statistically significant at the 5% level, using 1/n as an approximation ofthe autocorrelations variances, for n = 2385.

The absence of significant autocorrelations of the returns and the correlation of their modulusor squares, which constitute the basic properties motivating the introduction of GARCH models,are clearly shown for these data. But just as evident is the existence of an asymmetry in the impactof past innovations on the current volatility. More precisely, admitting that the process (εt ) issecond-order stationary and can be decomposed as εt = σtηt , where (ηt ) is an iid sequence and σtis a measurable, positive function of the past of εt , we have

ρ(ε+t , εt−h) = KCov(σt , εt−h) = K[Cov(σt , ε+t−h)+ Cov(σt , ε

−t−h)]

where K > 0. For the CAC data, except when h = 1 for which the autocorrelation is not significant,the estimates of ρ(ε+t , εt−h) seem to be significantly negative.1 Thus

Cov(σt , ε+t−h) < Cov(σt ,−ε−t−h),

which can be interpreted as a higher impact of the past price decreases on the current volatility,compared to the past price increases of the same magnitude. This phenomenon, Cov(σt , εt−h) < 0,is known in the finance literature as the leverage effect :2 volatility tends to increase dramaticallyfollowing bad news (that is, a fall in prices), and to increase moderately (or even to diminish)following good news.

The models we will consider in this chapter allow this asymmetry property to be incorporated.

10.1 Exponential GARCH Model

The following definition for the exponential GARCH (EGARCH) model mimics that given for thestrong GARCH.

Definition 10.1 (EGARCH(p, q) process) Let (ηt ) be an iid sequence such that E(ηt ) = 0 andVar(ηt ) = 1. Then (εt ) is said to be an exponential GARCH (EGARCH(p, q)) process if it satisfiesan equation of the form{

εt = σtηtlog σ 2

t = ω +∑q

i=1 αig(ηt−i )+∑p

j=1 βj logσ 2t−j ,

(10.3)

where

g(ηt−i ) = θηt−i + ς (|ηt−i | − E|ηt−i |) , (10.4)

and ω, αi , βj , θ and ς are real numbers.

1Recall, however, that for a noise which is conditionally heteroscedastic, the valid asymptotic bounds atthe 95% significancy level are not ±1.96/

√n (see Chapter 5).

2 When the price of a stock falls, the debt–equity ratio of the company increases. This entails an increaseof the risk and hence of the volatility of the stock. When the price rises, the volatility also increases but by asmaller amount.

Page 263: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 247

Remark 10.1 (On the EGARCH model)

1. The relation

σ 2t = eω

q∏i=1

exp{αig(ηt−i )}p∏

j=1

(σ 2t−j)βj

shows that, in contrast to the classical GARCH, the volatility has a multiplicativedynamics. The positivity constraints on the coefficients can be avoided, because thelogarithm can be of any sign.

2. According to the usual interpretation, however, innovations of large modulus should increasevolatility. This entails constraints on the coefficients: for instance, if log σ 2

t = ω + θηt−1 +ς (|ηt−1| − E|ηt−1|) , σ 2

t increases with |ηt−1|, the sign of ηt−1 being fixed, if and only if−ς < θ < ς . In the general case it suffices to impose

−ς < θ < ς, αi ≥ 0, βj ≥ 0.

3. The asymmetry property is taken into account through the coefficient θ . For instance, letθ < 0 and log σ 2

t = ω + θηt−1: if ηt−1 < 0 (that is, if εt−1 < 0), the variable logσ 2t will

be larger than its mean ω, and it will be smaller if εt−1 > 0. Thus, we obtain the typicalasymmetry property of financial time series.

4. Another difference from the classical GARCH is that the conditional variance is written asa function of the past standardized innovations (that is, divided by their conditional standarddeviation), instead of the past innovations. In particular, log σ 2

t is a strong ARMA(p, q − q ′)process, where q ′ is the first integer i such that αi = 0, because (g(ηt )) is a strong whitenoise, with variance

Var[g(ηt )] = θ2 + ς2Var(|ηt |)+ 2θςCov(ηt , |ηt |).5. A formulation which is very close to the EGARCH is the Log-GARCH, defined by

log σ 2t = ω +

q∑i=1

αi log(|εt−i | − ςiεt−i )+p∑j=1

βj log σ 2t−j ,

where, obviously, one has to impose |ςi | < 1.

6. The specification (10.4) allows for sign effects, through θηt−i , and for modulus effectsthrough ς (|ηt−i | − E|ηt−i |). This obviously induces, however, at least in the case q = 1,an identifiability problem, which can be solved by setting ς = 1. Note also that, to allowdifferent sign effects for the different lags, one could make θ depend on the lag index i,through the formulation

εt = σtηtlogσ 2

t = ω +∑q

i=1 αi {θiηt−i + (|ηt−i | − E|ηt−i |)}+∑p

j=1 βj log σ 2t−j .

(10.5)

As we have seen, specifications of the function g(·) that are different from (10.4) are possible,depending on the kind of empirical properties we are trying to mimic. The following result doesnot depend on the specification chosen for g(·). It is, however, assumed that Eg(ηt ) exists and isequal to 0.

Theorem 10.1 (Stationarity of the EGARCH(p, q) process) Assume that g(ηt ) is not almostsurely equal to zero and that the polynomials α(z) =∑q

i=1 αizi and β(z) = 1−∑p

i=1 βizi have no

common root, with α(z) not identically null. Then, the EGARCH(p, q)model defined in (10.3) admits

Page 264: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

248 GARCH MODELS

a strictly stationary and nonanticipative solution if and only if the roots of β(z) are outside the unitcircle. This solution is such that E(log ε2

t )2 <∞ whenever E(log η2

t )2 <∞ and Eg2(ηt ) <∞.

If, in addition,∞∏i=1

E exp{|λig(ηt )|} <∞, (10.6)

where the λi are defined by α(L)/β(L) =∑∞i=1 λiL

i , then (εt ) is a white noise with variance

E(ε2t ) = E(σ 2

t ) = eω∗∞∏i=1

gη(λi),

where ω∗ = ω/β(1) and gη(x) = E[exp{xg(ηt )}].

Proof. We have log ε2t = log σ 2

t + log η2t . Because logσ 2

t is the solution of an ARMA(p, q − 1)model, with AR polynomial β, the assumptions made on the lag polynomials are necessary andsufficient to express, in a unique way, logσ 2

t as an infinite-order moving average:

log σ 2t = ω∗ +

∞∑i=1

λig(ηt−i ), a.s.

It follows that the processes (log σ 2t ) and (log ε2

t ) are strictly stationary. The process (log σ 2t ) is

second-order stationary and, under the assumption E(log η2t )

2 <∞, so is (log ε2t ). Moreover, using

the previous expansion,

ε2t = σ 2

t η2t = eω

∗∞∏i=1

exp{λig(ηt−i )}η2t , a.s. (10.7)

Using the fact that the process g(ηt ) is iid, we get the desired result on the expectation of (ε2t )

(Exercise 10.4). �

Remark 10.2

1. When βj = 0 for j = 1, . . . , p (EARCH(q) model), the coefficients λi cancel for i > q.Hence, condition (10.6) is always satisfied, provided that E exp{|αig(ηt )|} <∞, for i =1, . . . , q. If the tails of the distribution of ηt are not too heavy (the condition fails for theStudent t distributions and specification (10.4)), an EARCH(q) process is then stationary,in both the strict and second-order senses, whatever the values of the coefficients αi .

2. When ηt is N(0, 1) distributed, and if g(·) is such that (10.4) holds, one can verify(Exercise 10.5) that

logE exp {|λig(ηt )| = O(λi). (10.8)

Since the λi are obtained from the inversion of the polynomial β(·), they decrease expo-nentially fast to zero. It is then easy to check that (10.6) holds true in this case, without anysupplementary assumption on the model coefficients. The strict and second-order stationar-ity conditions thus coincide, contrary to what happened in the standard GARCH case. Tocompute the second-order moments, classical integration calculus shows that

gη(λi) = exp

{−λiς

√2

π

}[exp

{λ2i (θ + ς)2

2

}�{λi(θ + ς)}

+ exp

{λ2i (θ − ς)2

2

}�{λi(ς − θ)}

],

where � denotes the cumulative distribution function of the N(0, 1).

Page 265: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 249

Theorem 10.2 (Moments of the EGARCH(p, q) process) Let m be a positive integer. Underthe conditions of Theorem 10.1 and if

µ2m = E(η2mt ) <∞,

∞∏i=1

E exp{|mλig(ηt )|} <∞,

(ε2t ) admits a moment of order m given by

E(ε2mt ) = µ2me

mω∗∞∏i=1

gη(mλi).

Proof. The result straightforwardly follows from (10.7) and Exercise 10.4. �

The previous computation shows that in the Gaussian case, moments exist at any order. This showsthat the leptokurticity property may be more difficult to capture with EGARCH than with standardGARCH models.

Assuming that E(log η2t )

2 <∞, the autocorrelation structure of the process (log ε2t ) can be

derived by taking advantage of the ARMA form of the dynamics of logσ 2t . Indeed, replacing the

terms in log σ 2t−j by log ε2

t−j − log η2t−j , we get

log ε2t = ω + log η2

t +q∑i=1

αig(ηt−i )+p∑j=1

βj log ε2t−j −

p∑j=1

βj log η2t−j .

Let

vt = log ε2t −

p∑j=1

βj log ε2t−j = ω + log η2

t +q∑i=1

αig(ηt−i )−p∑j=1

βj log η2t−j .

One can easily verify that (vt ) has finite variance. Since vt only depends on a finite number r(r = max(p, q)) of past values of ηt , it is clear that Cov(vt , vt−k) = 0 for k > r. It follows that(vt ) is an MA(r) process (with intercept) and thus that (log ε2

t ) is an ARMA(p, r) process. Thisresult is analogous to that obtained for the classical GARCH models, for which an ARMA(r, p)representation was exhibited for ε2

t . Apart from the inversion of the integers r and p, it is importantto note that the noise of the ARMA equation of a GARCH is the strong innovation of the square,whereas the noise involved in the ARMA equation of an EGARCH is generally not the stronginnovation of log ε2

t . Under this limitation, the ARMA representation can be used to identify theorders p and q and to estimate the parameters βj and αi (although the latter do not explicitlyappear in the representation).

The autocorrelations of (ε2t ) can be obtained from formula (10.7). Provided the moments exist

we have, for h> 0,

E(ε2t ε

2t−h) = E

{e2ω∗

h−1∏i=1

exp{λig(ηt−i )}η2t η

2t−h exp{λhg(ηt−h)}

×∞∏

i=h+1

exp{(λi + λi−h)g(ηt−i )}}

= e2ω∗{h−1∏i=1

gη(λi)

}E(η2

t−h exp{λhg(ηt−h)})∞∏

i=h+1

gη(λi + λi−h),

Page 266: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

250 GARCH MODELS

the first product being replaced by 1 if h = 1. For h> 0, this leads to

Cov(ε2t , ε

2t−h) = e2ω∗

[h−1∏i=1

gη(λi)E(η2t−h exp{λhg(ηt−h)})

∞∏i=h+1

gη(λi + λi−h)−∞∏i=1

{gη(λi)}2].

10.2 Threshold GARCH Model

A natural way to introduce asymmetry is to specify the conditional variance as a function of thepositive and negative parts of the past innovations. Recall that

ε+t = max(εt , 0), ε−t = min(εt , 0)

and note that εt = ε+t + ε−t . The threshold GARCH (TGARCH) class of models introduces athreshold effect into the volatility.

Definition 10.2 (TGARCH(p, q) process) Let (ηt ) be an iid sequence of random variables suchthat E(ηt ) = 0 and Var(ηt ) = 1. Then (εt ) is called a threshold GARCH(p, q) process if it satisfiesan equation of the form{

εt = σtηtσt = ω +∑q

i=1 αi,+ε+t−i − αi,−ε−t−i +

∑p

j=1 βjσt−j ,(10.9)

where ω, αi,+, αi,− and βi are real numbers.

Remark 10.3 (On the TGARCH model)

1. Under the constraints

ω> 0, αi,+ ≥ 0, αi,− ≥ 0, βi ≥ 0, (10.10)

the variable σt is always strictly positive and represents the conditional standard deviationof εt . In general, the conditional standard deviation of εt is |σt |: imposing the positivity ofσt is not required (contrary to the classical GARCH models, based on the specificationof σ 2

t ).

2. The GJR-GARCH model (named for Glosten, Jagannathan and Runkle, 1993) is a variant,defined by

σ 2t = ω +

q∑i=1

αiε2t−i + γiε

2t−i 1l{εt−i > 0} +

p∑j=1

βjσ2j ,

which corresponds to squaring the variables involved in the second equation of (10.9).

3. Through the coefficients αi,+ and αi,−, the current volatility depends on both the mod-ulus and the sign of past returns. The model is flexible, allowing the lags i of the pastreturns to display different asymmetries. Note also that this class contains, as special cases,models displaying no asymmetry, whose properties are very similar to those of the standardGARCH. Such models are obtained for αi,+ = αi,− := αi (i = 1, . . . , q) and take the form

σt = ω +q∑i=1

αi |εt−i | +p∑j=1

βjσt−j

(since |εt | = ε+t − ε−t ). This specification is called absolute value GARCH (AVGARCH).Whether it is preferable to model the conditional variance or the conditional standarddeviation is an open issue. However, it must be noted that for regression models with

Page 267: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 251

non-Gaussian and heteroscedastic errors, one can show that estimators of the noise variancebased on the absolute residuals are more efficient than those based on the squared residuals(see Davidian and Carroll, 1987).

Figure 10.1 depicts the major difference between GARCH and TGARCH models. The so-called ‘news impact curves’ display the impact of the innovations at time t − 1 on the volatility attime t , for first-order models. In this figure, the coefficients have been chosen in such a way thatthe marginal variances of εt in the two models coincide. In this TARCH example, in accordancewith the properties of financial time series, negative past values of εt−1 have more impact on thevolatility than positive values of the same magnitude. The impact is, of course, symmetrical in theARCH case.

TGARCH models display linearity properties similar to those encountered for the GARCH.Under the positivity constraints (10.10), we have

ε+t = σtη+t , ε−t = σtη

−t , (10.11)

which allows us to write the conditional standard deviation in the form

σt = ω +max{p,q}∑

i=1

ai(ηt−i )σt−i (10.12)

where ai(z) = αi,+z+ − αi,−z− + βi, i = 1, . . . ,max{p, q}. The dynamics of σt is thus given bya random coefficient autoregressive model.

Stationarity of the TGARCH(1, 1) Model

The study of the stationarity properties of the TGARCH(1, 1) model is based on (10.12) and followsfrom similar arguments to the GARCH(1, 1) case. The strict stationarity condition is written as

E[log(α1,+η+t − α1,−η−t + β1)] < 0. (10.13)

−10 −5 5 10 t –1

10

20

30

40s2

t –1

Figure 10.1 News impact curves for the ARCH(1) model, εt =√

1+ 0.38ε2t−1ηt (dashed line),

and TARCH(1) model, εt = (1− 0.5ε−t−1 + 0.2ε+t−1)ηt (solid line).

Page 268: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

252 GARCH MODELS

In particular, for the TARCH(1) model (β1 = 0) we have

log(α1,+η+t − α1,−η−t ) = log(α1,+) 1l{ηt > 0} + log(α1,−) 1l{ηt<0} + log |ηt |.

Hence, if the distribution of (ηt ) is symmetric the expectation of the two indicator variables isequal to 1/2 and the strict stationarity condition reduces to

α1,+α1,− < e−2E log |ηt |.

Exercise 10.8 shows that the second-order stationarity condition is

E[(α1,+η+t − α1,−η−t + β1)2] < 1. (10.14)

This condition can be made explicit in terms of the first two moments of η+t and η−t . For instance,if ηt is N(0, 1) distributed, we get

1

2(α2

1,+ + α21,−)+

2β1√2π

(α1,+ + α1,−)+ β21 < 1. (10.15)

Of course, the second-order stationarity condition is more restrictive than the strict stationaritycondition (see Figure 10.2).

Under the second-order stationarity condition, it is easily seen that the property of symmetry(10.1) is generally violated. For instance if the distribution of ηt is symmetric, we have, for theTARCH(1) model:

Cov(σt , εt−1) = α1,+E(ε+t−1)2 − α1,−E(ε−t−i )

2 = (α1,+ − α1,−)E(ε+t−1)2 = 0

whenever α1,+ = α1,−.

2 4 6 8 a1, +

a1, −

2

4

6

8

1

2

3

Figure 10.2 Stationarity regions for the TARCH(1) model with ηt ∼ N(0, 1): 1, second-orderstationarity; 1 and 2, strict stationarity; 3, nonstationarity.

Page 269: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 253

Strict Stationarity of the TGARCH(p, q) Model

The study of the general case relies on a representation analogous to (2.16), obtained by replacing, inthe vector z

t, the variables ε2

t−i by (ε+t−i ,−ε−t−i )′, the σ 2t−i by σt−i , and by an adequate modification

of bt and At . Specifically, using (10.11), we get

zt= bt + Atzt−1, (10.16)

where

bt = b(ηt ) =

ωη+t−ωη−t

0...

ω

0...

0

∈ Rp+2q, z

t=

ε+t−ε−t...

ε+t−q+1−ε−t−q+1

σt...

σt−p+1

∈ Rp+2q,

and

At =

η+t α1: q−1 αq,+η+t αq,−η+t η+t β1:p−1 βpη

+t

−η−t α1: q−1 −αq,+η−t −αq,−η−t −η−t β1:p−1 −βpη−tI2q−2 02q−2 02q−2 02q−2×p−1 02q−2

α1: q−1 αq,+ αq,− β1:p−1 βp0p−1×2q−2 0p−1 0p−1 Ip−1 0p−1

(10.17)

is a matrix of size (p + 2q)× (p + 2q),

α1: q−1 =(α1,+, α1,−, . . . , αq−1,+, αq−1,−

) ∈ R2q−2,

β1:p−1 =(β1, · · · , βp−1

) ∈ Rp−1.

The following result is analogous to that obtained for the strict stationarity of theGARCH(p, q).

Theorem 10.3 (Strict stationarity of the TGARCH(p, q) model) A necessary and suffi-cient condition for the existence of a strictly stationary and nonanticipative solution of theTGARCH(p, q) model (10.9)–(10.10) is that γ < 0, where γ is the top Lyapunov exponent of thesequence {At , t ∈ Z} defined by (10.17).

This stationary and nonanticipative solution, when γ < 0, is unique and ergodic.

Proof. The sufficient part of the proof of Theorem 2.4 can be straightforwardly adapted. As forthe necessary part, note that the coefficients of the matrices At , bt and z

tare positive. This allows

us to show, as was done previously, that A0 . . . A−kb−k−1 tends to 0 almost surely when k→∞.But since b−k−1 = ωη+−k−1e1 − ωη−−k−1e2 + ωe2q+1, using the positivity, we have

limk→∞

A0 . . . A−kωη+−k−1e1 = limk→∞

A0 . . . A−kωη−−k−1e2

= limk→∞

A0 . . . A−kωe2q+1 = 0, a.s.

It follows that limk→∞ A0 . . . A−kei = 0 a.s. for i = 1, . . . 2q + 1 by induction, as in the GARCHcase. �

Page 270: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

254 GARCH MODELS

Numerical evaluation, by means of simulation, of the Lyapunov coefficient γ can be time-consuming because of the large size of the matrices At . A condition involving matrices ofsmaller dimensions can sometimes be obtained. Suppose that the asymmetric effects have afactorization the form αi− = θαi+ for all lags i= 1, . . . , q. In this constrained model, theasymmetry is summarized by only one parameter θ = 1, the case θ > 1 giving more importanceto the negative returns.

Theorem 10.4 (Strict stationarity of the constrained TGARCH(p, q) model) A necessaryand sufficient condition for the existence of a strictly stationary and nonanticipative solution ofthe TGARCH(p, q) model (10.9), in which the coefficients ω, αi,− and αi,+ satisfy the positivityconditions (10.10) and the q − 1 constraints

α1,− = θα1,+, α2,− = θα2,+, . . . , αq,− = θαq,+,

is that γ ∗ < 0, where γ ∗ is the top Lyapunov exponent of the sequence of (p + q − 1)× (p + q − 1)matrices {A∗t , t ∈ Z} defined by

A∗t =

0 · · · 0 θ(ηt−1) 0 · · · 0Iq−2 0q−2×p+1

α2,+ · · · αq,+ α1,+θ(ηt−1)+ β1 β2 · · · βp0p−1×q−1 Ip−1 0p−1

, (10.18)

where θ(ηt ) = η+t − θη−t . This stationary and nonanticipative solution, when γ ∗ < 0, is uniqueand ergodic.

Proof. If the constrained TGARCH model admits a stationary solution (σt , εt ), then a stationarysolution exists for the model

z∗t= b∗t + A∗t z

∗t−1, (10.19)

where

b∗t = 0q−1

ω

0p−1

∈ Rp+q−1, z∗t=

ε+t−1 − θε−t−1...

ε+t−q+1 − θε−t−q+1σt...

σt−p+1

∈ Rp+q−1.

Conversely, if (10.19) admits a stationary solution, then the constrained TGARCH model admitsthe stationary solution (σt , εt ) defined by σt = z∗

t(q) (the qth component of z∗

t) and εt = σtηt .

Thus the constrained TGARCH model admits a strictly stationary solution if and only if model(10.19) has a strictly stationary solution. It can be seen that limk→∞ A∗0 · · ·A∗−kb∗−k−1 = 0 implieslimk→∞ A∗0 · · ·A∗−kei = 0 for i = 1, . . . , p + q − 1, using the independence of the matrices in theproduct A∗0 · · ·A∗−kb∗−k−1 and noting that, in the case where θ(ηt ) is not almost surely equal tozero, the qth component of b∗−k−1, the first and (q + 1)th components of A∗−kb

∗−k−1, the second

and (q + 2)th components of A∗−k+1A∗−kb

∗−k−1, etc., are strictly positive with nonzero probability.

In the case where θ(ηt ) = 0, the first q − 1 rows of A∗0 · · ·A∗−q+2 are null, which obviously showsthat limk→∞ A∗0 · · ·A∗−kei = 0 for i = 1, . . . , q − 1. For i = q, . . . , p + q − 1, the argument usedin the case θ(ηt ) = 0 remains valid. The rest of the proof is similar to that of Theorem 2.4. �

Page 271: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 255

mth-Order Stationarity of the TGARCH(p, q) Model

Contrary to the standard GARCH model, the odd-order moments are not more difficult to obtainthan the even-order ones for a TGARCH model. The existence condition for such moments isprovided by the following theorem.

Theorem 10.5 (mth-order stationarity) Let m be a positive integer. Suppose that E(|ηt |m) <∞. Let A(m) = E(A⊗mt ) where At is defined by (10.16). If the spectral radius

ρ(A(m)) < 1,

then, for any t ∈ Z, the infinite sum (zt) is a strictly stationary solution of (10.16) which con-

verges in Lm and the process (εt ), defined by εt = e′2q+1ztηt , is a strictly stationary solution of theTGARCH(p, q) model defined by (10.9), and admits moments up to order m.

Conversely, if ρ(A(m)) ≥ 1, there exists no strictly stationary solution (εt ) of (10.9) satisfyingthe positivity conditions (10.10) and the moment condition E(|εt |m) <∞.

The proof of this theorem is identical to that of Theorem 2.9.

Kurtosis of the TGARCH(1, 1) Model

For the TGARCH(1, 1) model with positive coefficients, the condition for the existence of E|εt |mcan be obtained directly. Using the representation

σt = ω + a(ηt−1)σt−1, a(η) = α1,+η+ − α1,−η− + β1,

we find that Eσmt exists and satisfies

Eσmt =m∑k=0

Ckmω

kEam−k(ηt−1)Eσm−kt

if and only if

Eam(ηt ) < 1. (10.20)

If this condition is satisfied for m = 4, then the kurtosis coefficient exists. Moreover, if ηt ∼ N(0, 1)we get

κε = 3Eσ 4

t

(Eσ 2t )

2,

and, using the notation ai = Eai(ηt ), the moments can be computed successively as

a1 = 1√2π

(α1,+ + α1,−

)+ β1,

Eσt = ω

1− a1,

a2 = 1

2

(α2

1,+ + α21,−)+ 2√

2πβ1(α1,+ + α1,−

)+ β21 ,

Eσ 2t =

ω2 {1+ a1}{1− a1} {1− a2} ,

Page 272: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

256 GARCH MODELS

a3 =√

2

π

(α3

1,+ + α31,−)+ 3

2β1(α2

1,+ + α21,−)+ 3√

2πβ2

1

(α1,+ + α1,−

)+ β31 ,

Eσ 3t =

ω3 {1+ 2a1 + 2a2 + a1a2}{1− a1} {1− a2} {1− a3} ,

a4 = 3

2

(α4

1,+ + α41,−)+ 4

√2

πβ1(α3

1,+ + α31,−)+ β4

1

+ 3β21

(α2

1,+ + α21,−)+ 4√

2πβ3

1

(α1,+ + α1,−

),

Eσ 4t =

ω4 {1+ 3a1 + 5a2 + 3a1a2 + 3a3 + 5a1a3 + 3a2a3 + a1a2a3}{1− a1} {1− a2} {1− a3} .

Many moments of the TGARCH(1, 1) can be obtained similarly, such as the autocorrelations ofthe absolute values (Exercise 10.9) and squares, but the calculations can be tedious.

10.3 Asymmetric Power GARCH Model

The following class is very general and contains the standard GARCH, the TGARCH, and theLog-GARCH.

Definition 10.3 (APARCH(p, q) process) Let (ηt ) be a sequence of iid variables such thatE(ηt ) = 0 and Var(ηt ) = 1. The process (εt ) is called an asymmetric power GARCH(p, q)) if itsatisfies an equation of the form{

εt = σtηtσ δt = ω +∑q

i=1 αi (|εt−i | − ςiεt−i )δ +∑p

j=1 βjσδt−j ,

(10.21)

where ω> 0, δ > 0, αi ≥ 0, βi ≥ 0 and |ςi | ≤ 1.

Remark 10.4 (On the APARCH model)

1. The standard GARCH(p, q) is obtained for δ = 2 and ς1 = · · · = ςq = 0.

2. To study the role of the parameter ςi , let us consider the simplest case, the asymmetricARCH(1) model. We have

σ 2t =

{ω + α1(1− ς1)

2ε2t−1, if εt−1 ≥ 0,

ω + α1(1+ ς1)2ε2

t−1, if εt−1 ≤ 0.(10.22)

Hence, the choice of ςi > 0 ensures that negative innovations have more impact on thecurrent volatility than positive ones of the same modulus. Similarly, for more complexAPARCH models, the constraint ςi ≥ 0 is a natural way to capture the typical asymmetricproperty of financial series.

3. Sinceαi |1± ςi |δεδt−i = αi |ςi |δ |1± 1/ςi |δεδt−i ,

|ςi | ≤ 1 is a nonrestrictive identifiability constraint.

4. If δ = 1, the model reduces to the TGARCH model. Using log σt = limδ→0(σδt − 1)/δ, one

can interpret the Log-GARCH model as the limit of the APARCH model when δ→ 0.The novelty of the APARCH model is in the introduction of the parameter δ. Note that

Page 273: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 257

autocorrelations of the absolute returns are often larger than autocorrelations of the squares.The introduction of the power δ increases the flexibility of GARCH-type models, and allowsthe a priori selection of an arbitrary power to be avoided.

Noting that {εt−i > 0} = {ηt−i > 0}, one can write

σ δt = ω +max{p,q}∑

i=1

ai(ηt−i )σ δt−i (10.23)

where

ai(z) = αi(|z| − ςz)δ + βi

= αi(1− ςi)δ |z|δ 1l{z> 0} +αi(1+ ς)δ |z|δ 1l{z<0} +βi,

for i = 1, . . . ,max{p, q}.

Stationarity of the APARCH(1, 1) Model

Relation (10.23) is an extension of (2.6) which allows us to obtain the stationarity conditions, as inthe classical GARCH(1, 1) case. The necessary and sufficient strict stationarity condition is thus

E log{α1(1− ς1)δ |ηt |δ 1l{ηt > 0} +α1(1+ ς1)

δ |ηt |δ 1l{ηt<0} +β1} < 0. (10.24)

For the APARCH(1, 0) model, we have

log{α1(1− ς1)δ|ηt |δ 1l{ηt > 0} +α1(1+ ς1)

δ|ηt |δ 1l{ηt<0}}= log(1− ς1)

δ 1l{ηt > 0} + log(1+ ς1)δ 1l{ηt<0} + logα1|ηt |δ,

showing that, if the distribution of (ηt ) symmetric, the strict stationarity condition reduces to

|1− ς1|δ/2 |1+ ς1|δ/2 α1 < e−E log |ηt |δ .

Note that in the limit case where |ς1| = 1, the model is strictly stationary for any value of α1, asmight be expected. Under condition (10.24), the strictly stationary solution is given by

εt = σtηt , σ δt = ω +∞∑k=1

a1(ηt ) · · · a1(ηt−k+1)ω.

Assuming E|ηt |δ <∞, the condition for the existence of Eεδt (and of Eσδt ) is

Ea1(ηt ) = α1{(1− ς1)

δEηδt 1l{ηt > 0} +(1+ ς1)δE|ηt |δ 1l{ηt<0}

}+ β1 < 1, (10.25)

which reduces to1

2E|ηt |δα1

{(1+ ς1)

δ + (1− ς1)δ}+ β1 < 1

when the distribution of (ηt ) symmetric, with

E|ηt |δ =√

π�

(1+ δ

2

)when ηt is Gaussian (� denoting the Euler gamma function). Figure 10.3 shows the strict andsecond-order stationarity regions of the APARCH(1, 0) model when ηt is Gaussian.

Page 274: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

258 GARCH MODELS

−1 −0.5 0.5 1

2

4

6

8

a1

V1

1

2

1

3

2

3

Figure 10.3 Stationarity regions for the APARCH(1,0) model with ηt ∼ N(0, 1): 1, second-orderstationarity; 1 and 2, strict stationarity; 3, nonstationarity.

Obviously, if δ ≥ 2 condition (10.25) is sufficient (but not necessary) for the existence of astrictly stationary and second-order stationary solution to the APARCH(1, 1) model. If δ ≤ 2, con-dition (10.25) is necessary (but not sufficient) for the existence of a second-order stationary solution.

10.4 Other Asymmetric GARCH Models

Among other asymmetric GARCH models, which we will not study in detail, let us mentionthe qualitative threshold ARCH (QTARCH) model, and the quadratic GARCH model (QGARCHor GQARCH), generalizing Example 4.2 in Chapter 4. The first-order model of this class, theQGARCH(1, 1), is defined by

εt = σtηt , σ 2t = ω + αε2

t−1 + ςεt−1 + βσ 2t−1, (10.26)

where ηt is a strong white noise with unit variance.

Remark 10.5 (On the QGARCH(1,1) model)

1. The function x �→ αx2 + ςx has its minimum at x = −ς/2α, and this minimum is −ς2/4α.A condition ensuring the positivity of σ 2

t is thus ω>−ς2/4α. This can also be seen bywriting

σ 2t = ω − ς2/4α + (

√αεt−1 + ς/2

√α)2 + βσ 2

t−1.

2. The conditionα1 + β1 < 1

is clearly necessary for the existence of a nonanticipative and second-order stationary solu-tion, but it seems difficult to prove that this condition suffices for the existence of a solution.Equation (10.26) cannot be easily expanded because of the presence of ε2

t−1 = σ 2t−1η

2t−1 and

εt−1 = σt−1ηt−1. It is therefore not possible to obtain an explicit solution as a function ofthe ηt−i . This makes QGARCH models much less tractable than the asymmetric modelsstudied in this chapter.

Page 275: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 259

3. The asymmetric effect is taken into account through the coefficient ς . A negative coefficiententails that negative returns have a bigger impact on the volatility of the next period thanpositive ones. A small price increase, such that the return is less than −ς/2α with ς > 0,can even produce less volatility than a zero return. This is a distinctive feature of thismodel, compared to the EGARCH, TGARCH or GJR-GARCH for which, by appropriatelyconstraining the parameters, the volatility at time t is minimal in the absence of pricemovement at time t − 1.

Many other asymmetric GARCH models have been introduced. Complex asymmetric responsesto past values may be considered. For instance, in the model

σt = ω + α|εt−1| 1l{|εt−1|≤γ } +α+εt−1 1l{εt−1 >γ } −α−εt−1 1l{εt−1<−γ }, α, α+, α−, γ ≥ 0,

asymmetry is only present for large innovations (whose amplitude is larger than the threshold γ ).

10.5 A GARCH Model with ContemporaneousConditional Asymmetry

A common feature of the GARCH models studied up to now is the decomposition

εt = σtηt ,

where σt is a positive variable and (ηt ) is an iid process. The various models differ by thespecification of σt as a measurable function of the εt−i for i > 0. This type of formulation impliesseveral important restrictions:

(i) The process (εt ) is a martingale difference.

(ii) The positive and negative parts of εt have the same volatility, up to a multiplicative factor.

(iii) The kurtosis and skewness of the conditional distribution of εt are constant.

Property (ii) is an immediate consequence of the equalities in (10.11). Property (iii) expressesthe fact that the conditional law of εt has the same ‘shape’ (symmetric or asymmetric, unimodalor polymodal, with or without heavy tails) as the law of ηt .

It can be shown empirically that these properties are generally not satisfied by financial timeseries. Estimated kurtosis and skewness coefficients of the conditional distribution often presentlarge variations in time. Moreover, property (i) implies that Cov(εt , zt−1) = 0, for any variablezt−1 ∈ L2 which is a measurable function of the past of εt . In particular, one must have

∀h> 0, Cov(εt , ε+t−h) = Cov(εt , ε

−t−h) = 0 (10.27)

or, equivalently,

∀h> 0, Cov(ε+t , ε+t−h) = Cov(−ε−t , ε+t−h), Cov(ε+t , ε

−t−h) = Cov(−ε−t , ε−t−h). (10.28)

We emphasize the difference between (10.27) and the characterization (10.2) of the asymmetrystudied previously. When (10.27) does not hold, one can speak of contemporaneous asymmetrysince the variables ε+t and −ε−t , of the current date, do not have the same conditional distribution.

For the CAC index series, Table 10.2 completes Table 10.1, by providing the cross empiricalautocorrelations of the positive and negative parts of the returns.

Page 276: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

260 GARCH MODELS

Table 10.2 Empirical autocorrelations (CAC 40, for the period 1988–1998).

h 1 2 3 4 5 10 20 40

ρ(ε+t , ε+t−h) 0.037 −0.006 −0.013 0.029 −0.039* 0.017 0.023 0.001

ρ(−ε−t , ε+t−h) −0.013 −0.035 −0.019 −0.025 −0.028 −0.007 −0.020 0.017ρ(ε+t ,−ε−t−h) 0.026 0.088* 0.135* 0.047* 0.088* 0.056* 0.049* 0.065*

ρ(ε−t , ε−t−h) 0.060* 0.074* 0.041* 0.070* 0.027 0.077* 0.015 −0.008*

∗indicate parameters that are statistically significant at the level 5%, using 1/n as an approximation for theautocorrelations variance, for n = 2385.

Without carrying out a formal test, comparison of rows 1 and 3 (or 2 and 4) shows that theleverage effect is present, whereas comparison of rows 3 and 4 shows that property (10.28) doesnot hold.

A class of GARCH-type models allowing the two kinds of asymmetry is defined as follows. Let

εt = σt,+η+t + σt,−η−t , t ∈ Z, (10.29)

where {ηt } is centered, ηt is independent of σt,+ and σt,−, and{σt,+ = α0,+ +

∑q

i=1 α+i,+ε

+t−i − α−i,+ε

−t−i +

∑p

j=1 β+j,+σt−j,+ + β−j,+σt−j,−

σt,− = α0,− +∑q

i=1 α+i,−ε

+t−i − α−i,−ε

−t−i +

∑p

j=1 β+j,−σt−j,+ + β−j,−σt−j,−

where α+i,+, α+i,−, . . . , β

−j,− ≥ 0, α0,+, α0,−> 0. Without loss of generality, it can be assumed that

E(η+t ) = E(−η−t ) = 1.As an immediate consequence of the positivity of σt,+ and σt,−, we obtain

ε+t = σt,+η+t and ε−t = σt,−η−t , (10.30)

which will be crucial for the study of this model.Thus, σt,+ and σt,− can be interpreted as the volatilities of the positive and negative parts

of the noise (up to a multiplicative constant, since we did not specify the variances of η+t andη−t ). In general, the nonanticipative solution of this model, when it exists, is not a martingaledifference because

E(εt | εt−1, . . .) = (σt,+ − σt,−)E(η+t ) = 0.

An exception is of course the situation where the parameters of the dynamics of σt,+ and σt,−coincide, in which case we obtain model (10.9).

A simple computation shows that the kurtosis coefficient of the conditional law of εt is given by

κt =∑4

k=0

(4k

)σ kt,+σ

4−kt,− c(k, 4− k)[∑2

k=0

(2k

)σ kt,+σ

2−kt,− c(k, 2− k)

]2 , (10.31)

where c(k, l) = E[{η+t − E(η+t )}k{η−t − E(η−t )}l], provided that E(η4t ) <∞. A similar computa-

tion can be done for the conditional skewness, showing that the shape of the conditional distributionvaries in time, in a more important way than for classical GARCH models.

Methods analogous to those developed for the other GARCH models allow us to obtain exis-tence conditions for the stationary and nonanticipative solutions (references are given at the end

Page 277: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 261

of the chapter). In contrast to the GARCH models analyzed previously, the stationary solution (εt )is not always a white noise.

10.6 Empirical Comparisons of Asymmetric GARCHFormulations

We will restrict ourselves to the simplest versions of the GARCH introduced in this chapter, andconsider their fit to the series of CAC 40 index returns, rt , over the period 1988–1998 consistingof 2385 values.

Descriptive Statistics

Figure 10.4 displays the first 500 values of the series. The volatility clustering phenomenon isclearly evident. The correlograms in Figure 10.5 indicate absence of autocorrelation. However,squared returns present significant autocorrelations, which is another sign that the returns arenot independent. Ljung–Box portmanteau tests, such as those available in SAS (see Table 10.3;Chapter 5 gives more details on these tests), confirm the visual analysis provided by the correl-ograms. The left-hand graph of Figure 10.6, compared to the right-hand graph of Figure 10.5,seems to indicate that the absolute returns are slightly more strongly correlated than the squares.The right-hand graph of Figure 10.6 displays empirical correlations between the series |rt | andrt−h. It can be seen that these correlations are negative, which implies the presence of leverageeffects (more accentuated, apparently, for lags 2 and 3 than for lag 1).

100 200 300 400 500

0.001

0.002

0.003

0.004

−0.06

−0.04

−0.02

0.02

0.04

Figure 10.4 The first 500 values of the CAC 40 index (left) and of the squared index (right).

2 4 6 8 10 12 14

0.05

0.1

0.15

0.2

0.25

0.3

2 4 6 8 10 12 14−0.05

0.05

0.1

0.15

0.2

0.25

0.3

Figure 10.5 Correlograms of the CAC 40 index (left) and the squared index (right). Dashed linescorrespond to ±1.96/

√n.

Page 278: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

262 GARCH MODELS

Table 10.3 Portmanteau test of the white noise hypothesis for the CAC 40 series (upper panel)and for the squared index (lower panel).

Autocorrelation Check for White Noise

To Chi- Pr >

Lag Square DF Khi2 -----------------Autocorrelations-----------------

6 11.51 6 0.0737 0.030 0.005 -0.032 0.028 -0.046 -0.001

12 16.99 12 0.1499 -0.018 -0.014 0.034 0.016 0.017 0.010

18 21.22 18 0.2685 -0.005 0.025 -0.031 -0.009 -0.003 0.006

24 27.20 24 0.2954 -0.023 0.003 -0.010 0.030 -0.027 -0.015

Autocorrelation Check for White Noise

To Chi- Pr >

Lag Square DF Khi2 -----------------Autocorrelations-----------------

6 165.90 6 <.0001 0.129 0.127 0.117 0.084 0.101 0.074

12 222.93 12 <.0001 0.051 0.060 0.070 0.092 0.058 0.030

18 238.11 18 <.0001 0.053 0.036 0.020 0.041 0.002 0.013

24 240.04 24 <.0001 0.006 0.024 0.013 0.003 0.001 -0.002

2 4 6 8 10 12 14

−0.1

0.1

0.2

0.3

2 4 6 8 10 12 14

0.05

0.1

0.15

0.2

0.25

0.3

Figure 10.6 Correlogram h �→ ρ(|rt |, |rt−h|) of the absolute CAC 40 returns (left) and crosscorrelograms h �→ ρ(|rt |, rt−h) measuring the leverage effects (right).

Fit by Symmetric and Asymmetric GARCH Models

We will consider the classical GARCH(1, 1) model and the simplest asymmetric models (whichare the most widely used). Using the AUTOREG and MODEL procedures of SAS, the estimatedmodels are:

GARCH(1, 1) model

rt = 5× 10−4 + εt , εt = σtηt , ηt ∼ N(0, 1)

(2×10−4)

σ 2t = 8× 10−6 + 0.09 ε2

t−1+ 0.84 σ 2t−1

(2×10−6) (0.02) (0.02)

(10.32)

Page 279: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 263

EGARCH(1, 1) model

rt = 4× 10−4 + εt , εt = σt ηt , ηt ∼ N(0, 1)(2×10−4)

log σ 2t = −0.64 + 0.15 ( −0.53 ηt−1 + |ηt−1| −

√2/π)

(0.15) (0.03) (0.14)

+ 0.93 log σ 2t−1

(0.02)

(10.33)

QGARCH(1, 1) modelrt = 3× 10−4 + εt , εt = σt ηt , ηt ∼ N(0, 1)

(2. 10−4)

σ 2t = 9× 10−6 + 0.07 ε2

t−1 − 9× 10−4 εt−1 + 0.85 σ 2t−1

(2. 10−6) (0.01) (2×10−4) (0.03)

(10.34)

GJR-GARCH(1, 1) modelrt = 4× 10−4 + εt , εt = σt ηt , ηt ∼ N(0, 1)

(2×10−4)

σ 2t = 1× 10−5 + 0.13 ε2

t−1 − 0.10 ε2t−1 1l{εt−1 > 0} + 0.84 σ 2

t−1(2×10−6) (0.02) (0.02) (0.03)

(10.35)

TGARCH(1, 1) modelrt = 4× 10−4 + εt , εt = σt ηt , ηt ∼ N(0, 1)

(2×10−4)

σt = 8× 10−4 + 0.03 ε+t−1 − 0.12 ε−t−1 + 0.87 σt−1

(2×10−4) (0.01) (0.02) (0.02).

(10.36)

Interpretation of the Estimated Coefficients

Note that all the estimated models are stationary. The standard GARCH(1, 1) admits a fourth-ordermoment since, in view of the computation on page 45, we have 3α2 + β2 + 2αβ < 1. It is thuspossible to compute the variance and kurtosis in this estimated model (which are respectively equalto 1.3× 10−4 and 3.49 for the standard GARCH(1, 1)). Given the ARMA(1, 1) representation forε2t , we have ρε2(h) = (α + β)ρε2(h− 1) for any h> 1. Since α + β = 0.09+ 0.84 is close to 1,

the decay of ρε2(h) to zero will be slow when h→∞, which can be interpreted as a sign ofstrong persistence of shocks.3

Note that in the EGARCH model the parameter θ = −0.53 is negative, implying the presenceof the leverage effect. A similar interpretation can be given to the negative sign of the coefficientof εt−1 in the QGARCH model, and to that of ε2

t−1 1l{εt−1 > 0} in the GJR-GARCH model. In theTGARCH model, the leverage effect is present since α1,−>α1,+> 0.

The TGARCH model seems easier to interpret than the other asymmetric models. The volatility(that is, the conditional standard deviation) is the sum of four terms. The first is the interceptω = 8× 10−4. The term ω/(1− β1) = 0.006 can be interpreted as a ‘minimal volatility’, obtainedby assuming that all the innovations are equal to zero. The next two terms represent the impactof the last observation, distinguishing the sign of this observation, on the current volatility. In the

3 In the strict sense, and for any reasonable specification, shocks are nonpersistent because ∂σ 2t+h/∂εt → 0

a.s., but we wish to express the fact that, in some sense, the decay to 0 is slow.

Page 280: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

264 GARCH MODELS

Table 10.4 Likelihoods of the different models for the CAC 40 series.

GARCH EGARCH QGARCH GJR-GARCH TGARCHlogLn 7393 7404 7404 7406 7405

estimated model, the impact of a positive value is 3.5 times less than that of a negative one. Thelast coefficient measures the importance of the last volatility. Even in absence of news, the decayof the volatility is slow because the coefficient β1 = 0.87 is rather close to 1.

Likelihood Comparisons

Table 10.4 gives the log-likelihood, logLn, of the observations for the different models. One cannotdirectly compare the log-likelihood of the standard GARCH(1, 1) model, which has one parameterless, with that of the other models, but the log-likelihoods of the asymmetric models, which allhave five parameters, can be compared. The largest likelihood is observed for the GJR thresholdmodel, but, the difference being very slight, it is not clear that this model is really superior tothe others.

Resemblances between the Estimated Volatilities

Figure 10.7 shows that the estimated volatilities for the five models are very similar. It followsthat the different specifications produce very similar prediction intervals (see Figure 10.8).

Distances between Estimated Models

Differences can, however, be discerned between the various specifications. Table 10.5 gives aninsight into the distances between the estimated volatilities for the different models. From thispoint of view, the TGARCH and EGARCH models are very close, and are also the most distantfrom the standard GARCH. The QGARCH model is the closest to the standard GARCH. Rathersurprisingly, the TGARCH and GJR-GARCH models appear quite different. Indeed, the GJR-GARCH is a threshold model for the conditional variance and the TGARCH is a similar modelfor the conditional standard deviation.

Figure 10.9 confirms the results of Table 10.5. The left-hand scatterplot shows(σ 2t,TGARCH − σ 2

t,GARCH, σ2t,EGARCH − σ 2

t,GARCH

), t = 1, . . . , n,

and the right-hand one(σ 2t,TGARCH − σ 2

t,GARCH, σ2t,GJR−GARCH − σ 2

t,GARCH

)t = 1, . . . , n.

The left-hand graph shows that the difference between the estimated volatilities of the TGARCHand the standard GARCH, denoted by σ 2

t,TGARCH − σ 2t,GARCH, is always very close to the dif-

ference between the estimated volatilities of the EGARCH and the standard GARCH, denotedby σ 2

t,EGARCH − σ 2t,GARCH (the difference from the standard GARCH is introduced to make the

graphs more readable). The right-hand graph shows much more important differences between theTGARCH and GJR-GARCH specifications.

Comparison between Implied and Sample Values of the Persistenceand of the Leverage Effect

We now wish to compare, for the different models, the theoretical autocorrelations ρ(|rt |, |rt−h|)and ρ(|rt |, rt−h) to the empirical ones. The theoretical autocorrelations being difficult – if not

Page 281: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 265

100 200 300 400 500

1

2

3

4

5

−0.06

−0.04

−0.02

0.02

0.04

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

6

1

2

3

4

5

100 200 300 400 500

100 200 300 400 500100 200 300 400 500

100 200 300 400 500

Figure 10.7 From left to right and top to bottom, graph of the first 500 values of the CAC 40index and estimated volatilities (×104) for the GARCH(1, 1), EGARCH(1, 1), QGARCH(1, 1),GJR-GARCH(1, 1) and TGARCH(1, 1) models.

impossible – to obtain analytically, we used simulations of the estimated model to approximatethese theoretical autocorrelations by their empirical counterparts. The length of the simulations,50 000, seemed sufficient to obtain good accuracy (this was confirmed by comparing the empiricaland theoretical values when the latter were available).

Figure 10.10 shows satisfactory results for the standard GARCH model, as far as the auto-correlations of absolute values are concerned. Of course, this model is not able to reproduce thecorrelations induced by the leverage effect. Such autocorrelations are adequately reproduced bythe TARCH model, as can be seen from the top and bottom right panels. The autocorrelations forthe other asymmetric models are not reproduced here but are very similar to those of the TARCH.The negative correlations between rt and the rt−h appear similar to the empirical ones.

Implied and Empirical Kurtosis

Table 10.6 shows that the theoretical variances obtained from the estimated models are close tothe observed variance of the CAC 40 index. In contrast, the estimated kurtosis values are all muchbelow the observed value.

Page 282: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

266 GARCH MODELS

100 200 300 400 500

−0.06

−0.04

−0.02

0.02

0.04

0.06

100 200 300 400 500

−0.075

−0.05

−0.025

0.025

0.05

0.075

Figure 10.8 Returns rt of the CAC 40 index (solid lines) and confidence intervals r ± 3σt (dottedlines), where r is the empirical mean of the returns over the whole period 1988–1998 and σt isthe estimated volatility in the standard GARCH(1, 1) model (left) and in the EGARCH(1, 1)model (right).

Table 10.5 Means of the squared differences between the estimated volatilities (×1010).

GARCH EGARCH QGARCH GJR TGARCH

GARCH 0 10.98 3.58 7.64 12.71EGARCH 10.98 0 3.64 6.47 1.05QGARCH 3.58 3.64 0 3.25 4.69GJR 7.64 6.47 3.25 0 9.03TGARCH 12.71 1.05 4.69 9.03 0

Figure 10.9 Comparison of the estimated volatilities of the EGARCH and TARCH models (left),and of the TGARCH and GJR-GARCH models (right). The estimated volatilities are close whenthe scatterplot is elongated (see text).

In all these five models, the conditional distribution of the returns is assumed to be N(0, 1). Thischoice may be inadequate, which could explain the discrepancy between the estimated theoreticaland the empirical kurtosis. Moreover, the normality assumption is clearly rejected by statisticaltests, such as the Kolmogorov–Smirnov test, applied to the standardized returns. A leptokurticdistribution is observed for those standardized returns.

Table 10.7 reveals a large number of returns outside the interval [r − 3σt , r + 3σt ], whatever thespecification used for σt . If the conditional law were Gaussian and if the conditional variance were

Page 283: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 267

2 4 6 8 10 12 14

−0.1

−0.05

0.05

0.1

2 4 6 8 10 12 14

−0.05−0.025

0.0250.05

0.0750.1

0.125

2 4 6 8 10 12 14

−0.1

−0.05

0.05

0.1

2 4 6 8 10 12 14

−0.05−0.025

0.0250.05

0.0750.1

0.125

2 4 6 8 10 12 14

−0.1

−0.05

0.05

0.1

2 4 6 8 10 12 14

−0.05−0.025

0.0250.05

0.0750.1

0.125

Figure 10.10 Correlogram h �→ ρ(|rt |, |rt−h|) of the absolute values (left) and cross correlogramh �→ ρ(|rt |, rt−h) measuring the leverage effect (right), for the CAC 40 series (top), for the standardGARCH (middle), and for the TGARCH (bottom) estimated on the CAC 40 series.

Table 10.6 Variance (×104) and kurtosis of the CAC 40 index and of simulations of length50 000 of the five estimated models.

CAC 40 GARCH EGARCH QGARCH GJR TGARCH

Kurtosis 5.9 3.5 3.4 3.3 3.6 3.4Variance 1.3 1.3 1.3 1.3 1.3 1.3

correctly specified, the probability of one return falling outside the interval would be 2{1−�(3)} =0.0027, which would correspond to an average of 6 values out of 2385.

Asymmetric GARCH Models with Non-Gaussian Innovations

To take into account the leptokurtic shape of the residuals distribution, we re-estimated the fiveGARCH models with a Student t distribution – whose parameter is estimated – for ηt .

Page 284: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

268 GARCH MODELS

Table 10.7 Number of CAC returns outside the limits r ± 3σt (THEO being the theoreticalnumber when the conditional distribution is N(0, σ 2

t ).

THEO GARCH EGARCH QGARCH GJR TGARCH6 17 13 14 15 13

Table 10.8 Means of the squares of the differences between the estimated volatilities (×1010)for the models with Student innovations and the TGARCH model with Gaussian innovations(model (10.34) denoted TGARCHN ).

GARCH EGARCH QGARCH GJR TGARCH TGARCHN

GARCH 0 5.90 2.72 5.89 7.71 15.77EGARCH 5.90 0 2.27 5.08 0.89 8.92QGARCH 2.72 2.27 0 2.34 3.35 9.64GJR 5.89 5.08 2.34 0 7.21 11.46TGARCH 7.71 0.89 3.35 7.21 0 7.75TGARCHN 15.77 8.92 9.64 11.46 7.75 0

For instance, the new estimated TGARCH model isrt = 5× 10−4 + εt , εt = σt ηt , ηt ∼ t (9.7)

(2×10−4)

σt = 4. 10−4 + 0.03 ε+t−1 − 0.10 ε−t−1 + 0.90 σt−1

(1. 10−4) (0.01) (0.02) (0.02).

(10.37)

It can be seen that the estimated volatility is quite different from that obtained with the normaldistribution (see Table 10.8).

Model with Interventions

Analysis of the residuals show that the values observed at times t = 228, 682 and 845 are scarcelycompatible with the selected model. There are two ways to address this issue: one could eitherresearch a new specification that makes those values compatible with the model, or treat thesethree values as outliers for the selected model.

In the first case, one could replace the N(0, 1) distribution of the noise ηt with a more appro-priate (leptokurtic) one. The first difficulty with this is that no distribution is evident for these data(it is clear that distributions of Student t or generalized error type would not provide good approx-imations of the distribution of the standardized residuals). The second difficulty is that changingthe distribution might considerably enlarge the confidence intervals. Take the example of a 99%confidence interval at horizon 1. The initial interval [r − 2.57σt , r + 2.57σt ] simply becomes thedilated interval [r − t0.995σt , r + t0.995σt ] with t0.995 � 2.57, provided that the estimates σt arenot much affected by the change of conditional distribution. Even if the new interval does contain99% of returns, there is a good chance that it will be excessively large for most of the data.

So for this first case we should ideally change the prediction formula for σt so that the estimatedvolatility is larger for the three special data (the resulting smaller standardized residuals (rt −r)/√n would become consistent with the N(0, 1) distribution), without much changing volatilities

estimated for other data. Finding a reasonable model that achieves this change seems quite difficult.We have therefore opted for the second approach, treating these three values as outliers. Con-

ceptually, this amounts to assuming that the model is not appropriate in certain circumstances.One can imagine that exceptional events occurred shortly before the three dates t = 228, 682 and

Page 285: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 269

Table 10.9 SAS program for the fitting of a TGARCH(1, 1)model with interventions.

/* Data reading */

data cac;

infile ’c:\enseignement\PRedess\Garch\cac8898.dat’;

input indice; date=_n_;

run;

/* Estimation of a TGARCH(1,1) model */

proc model data = cac ;

/* Initial values are attributed to the parameters */

parameters cacmod1 -0.075735 cacmod2 -0.064956 cacmod3 -0.0349778 omega .000779460

alpha_plus 0.034732 alpha_moins 0.12200 beta 0.86887 intercept .000426280 ;

/* The index is regressed on a constant and 3 interventions are made*/

if (_obs_ = 682 ) then indice= cacmod1;

else if (_obs_ = 228 ) then indice= cacmod2;

else if (_obs_ = 845 ) then indice= cacmod3;

else indice = intercept ;

/* The conditional variance is modeled by a TGARCH */

if (_obs_ = 1 ) then

if((alpha_plus+alpha_moins)/sqrt(2*constant(’pi’))+beta=1) then

h.indice = (omega + (alpha_plus/2+alpha_moins/2+beta)*sqrt(mse.indice))**2 ;

else h.indice = (omega/(1-(alpha_plus+alpha_moins)/sqrt(2*constant(’pi’))-beta))**2;

else

if zlag(-resid.indice) > 0 then h.indice = (omega + alpha_plus*zlag(-resid.indice)

+ beta*zlag(sqrt(h.indice)))**2 ;

else h.indice = (omega - alpha_moins*zlag(-resid.indice) + beta*zlag(sqrt(h.indice)))**2 ;

/* The model is fitted and the normalized residuals are stored in a SAS table*/

outvars nresid.indice;

fit indice / method = marquardt fiml out=residtgarch ;

run ; quit ;

845. Other special events may occur in the future, and our model will be unable to anticipate thechanges in volatility induced by these extraordinary events. The ideal would be to know the valuesthat returns would have had if these exceptional event had not occurred, and to work with thesecorrected values. This is of course not possible, and we must also estimate the adjusted values.We will use an intervention model, assuming that only the returns of the three dates would havechanged in the absence of the above-mentioned exceptional events. Other types of interventionscan of course be envisaged. To estimate what would have been the returns of the three dates in theabsence of exceptional events, we can add these three values to the parameters of the likelihood.This can easily be done using an SAS program (see Table 10.9).

10.7 Bibliographical Notes

The asymmetric reaction of the volatility to past positive and negative shocks has been welldocumented since the articles by Black (1976) and Christie (1982). These articles use the leverageeffect to explain the fact that the volatility tends to overreact to price decreases, compared to priceincreases of the same magnitude. Other explanations, related to the existence of time-dependentrisk premia, have been proposed; see, for instance, Campbell and Hentschel (1992), Bekaert andWu (2000) and references therein. More recently, Avramov, Chordia and Goyal (2006) advancedan explanation founded on the volume of the daily exchanges. Empirical evidence of asymmetryhas been given in numerous studies: see, for example, Engle and Ng (1993), Glosten and al. (1993),Nelson (1991), Wu (2001) and Zakoıan (1994).

Page 286: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

270 GARCH MODELS

The introduction of ‘news impact curves’ providing a visualization of the different forms ofvolatility is due to Pagan and Schwert (1990) and Engle and Ng (1993). The AVGARCH modelwas introduced by Taylor (1986) and Schwert (1989). The EGARCH model was introduced andstudied by Nelson (1991). The GJR-GARCH model was introduced by Glosten, Jagannathan andRunkle (1993). The TGARCH model was introduced and studied by Zakoıan (1994). This model isinspired by the threshold models of Tong (1978) and Tong and Lim (1980), which are used for theconditional mean. See Goncalves and Mendes Lopes (1994, 1996) for the stationarity study of theTGARCH model. An extension was proposed by Rabemananjara and Zakoıan (1993) in which thevolatility coefficients are not constrained to be positive. The TGARCH model was also extended tothe case of a nonzero threshold by Hentschel (1995), and to the case of multiple thresholds by Liuand al. (1997). Model (10.21), with δ = 2 and ς1 = · · · = ςq , was studied by Straumann (2005).This model is called ‘asymmetric GARCH’ (AGARCH(p, q)) by Straumann, but the acronymAGARCH, which has been employed for several other models, is ambiguous. A variant is thedouble-threshold ARCH (DTARCH) model of Li and Li (1996), in which the thresholds appearboth in the conditional mean and the conditional variance. Specifications making the transitionvariable continuous were proposed by Hagerud (1997), Gonzalez-Rivera (1998) and Taylor (2004).Various classes of models rely on Box–Cox transformations of the volatility: the APARCH modelwas proposed by Higgins and Bera (1992) in its symmetric form (NGARCH model) and thengeneralized by Ding, Granger and Engle (1993); another generalization is that of Hwang and Kim(2004). The qualitative threshold ARCH model was proposed by Gourieroux and Monfort (1992),the quadratic ARCH model by Sentana (1995). The conditional density was modeled by Hansen(1994). The contemporaneous asymmetric GARCH model of Section 10.5 was proposed by ElBabsiri and Zakoıan (2001). In this article, the strict and second-order stationarity conditions wereestablished and the statistical inference was studied. Recent comparisons of asymmetric GARCHmodels were proposed by Awartani and Corradi (2006), Chen, Gerlach and So (2006) and Hansenand Lunde (2005).

10.8 Exercises

10.1 (Noncorrelation between the volatility and past values when the law of ηt is symmetric)Prove the symmetry property (10.1).

10.2 (The expectation of a product of independent variables is not always the product of theexpectations)Find a sequence Xi of independent real random variables such that Y =∏∞

i=1 Xi existsalmost surely, EY and EXi exist for all i, and

∏∞i=1 EXi exists, but such that EY =∏∞

i=1 EXi .

10.3 (Convergence of an infinite product entails convergence of the infinite sum of logarithms)Prove that, under the assumptions of Theorem 10.1, condition (10.6) entails the absoluteconvergence of the series of general term log gη(λi).

10.4 (Variance of an EGARCH)Complete the proof of Theorem 10.1 by showing in detail that (10.7) entails the desiredresult on Eε2

t .

10.5 (A Gaussian EGARCH admits a variance)Show that, for an EGARCH with Gaussian innovations, condition (10.8) for the existenceof a second-order moment is satisfied.

Page 287: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ASYMMETRIES 271

10.6 (ARMA representation for the logarithm of the square of an EGARCH)Compute the ARMA representation of log ε2

t when εt is an EGARCH(1, 1) process withηt Gaussian. Provide an explicit expression by giving numerical values for the EGARCHcoefficients.

10.7 (β-mixing of an EGARCH)Using Exercise 3.5, give simple conditions for an EGARCH(1, 1) process to be geometri-cally β-mixing.

10.8 (Stationarity of a TGARCH)Establish the second-order stationarity condition (10.14) of a TGARCH(1, 1) process.

10.9 (Autocorrelation of the absolute value of a TGARCH)Compute the autocorrelation function of the absolute value of a TGARCH(1, 1) processwhen the noise ηt is Gaussian. Would this computation be feasible for a standard GARCHprocess?

10.10 (A TGARCH is an APARCH)Check that the results obtained for the APARCH(1, 1) model can be used to retrieve thoseobtained for the TGARCH(1, 1) model.

10.11 (Study of a thresold model)Consider the model

εt = σtηt , log σt = ω + α+ηt−1 1l{ηt−1 > 0} −α−ηt−1 1l{ηt−1<0} .

To which class does this model belong? Which constraints is it natural to impose on the coef-ficients? What are the strict and second-order stationarity conditions? Compute Cov(σt , εt−1)

in the case where ηt ∼ N(0, 1), and verify that the model can capture the leverage effect.

Page 288: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 289: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

11

Multivariate GARCH Processes

While the volatility of univariate series has been the focus of the previous chapters, modeling thecomovements of several series is of great practical importance. When several series displayingtemporal or contemporaneous dependencies are available, it is useful to analyze them jointly, byviewing them as the components of a vector-valued (multivariate) process. The standard linearmodeling of real time series has a natural multivariate extension through the framework of thevector ARMA (VARMA) models. In particular, the subclass of vector autoregressive (VAR) modelshas been widely studied in the econometric literature. This extension entails numerous specificproblems and has given rise to new research areas (such as cointegration).

Similarly, it is important to introduce the concept of multivariate GARCH model. For instance,asset pricing and risk management crucially depend on the conditional covariance structure of theassets of a portfolio. Unlike the ARMA models, however, the GARCH model specification does notsuggest a natural extension to the multivariate framework. Indeed, the (conditional) expectationof a vector of size m is a vector of size m, but the (conditional) variance is an m×m matrix.A general extension of the univariate GARCH processes would involve specifying each of them(m+ 1)/2 entries of this matrix as a function of its past values and the past values of theother entries. Given the excessive number of parameters that this approach would entail, it is notfeasible from a statistical point of view. An alternative approach is to introduce some specificationconstraints which, while preserving a certain generality, make these models operational.

We start by reviewing the main concepts for the analysis of the multivariate time series.

11.1 Multivariate Stationary Processes

In this section, we consider a vector process (Xt )t∈Z of dimension m, Xt = (X1t , . . . , Xmt)′. The

definition of strict stationarity (see Chapter 1, Definition 1.1) remains valid for vector processes,while second-order stationarity is defined as follows.

Definition 11.1 (Second-order stationarity) The process (Xt ) is said to be second-order station-ary if:

(i) EX2it <∞, ∀t ∈ Z, i = 1, . . . , m;

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 290: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

274 GARCH MODELS

(ii) EXt = µ, ∀t ∈ Z;

(iii) Cov(Xt ,Xt+h) = E[(Xt − µ)(Xt+h − µ)′] = �(h), ∀t, h ∈ Z.

The function �(·), taking values in the space of m×m matrices, is called the autocovariancefunction of (Xt ).

Obviously �X(h) = �X(−h)′. In particular, �X(0) = Var(Xt ) is a symmetric matrix.The simplest example of a multivariate stationary process is white noise, defined as a sequence

of centered and uncorrelated variables whose covariance matrix is time-independent.The following property can be used to construct stationary processes by linear transformation

of another stationary process.

Theorem 11.1 (Stationary linear filter) Let (Zt ) denote a stationary process, Zt ∈ Rm. Let(Ck)k∈Z denote a sequence of nonrandom n×m matrices, such that, for all i = 1, . . . , n, forall j = 1, . . . , m,

∑k∈Z |c(k)ij | <∞, where Ck = (c

(k)ij ). Then the Rn-valued process defined by

Xt =∑

k∈Z CkZt−k is stationary and we have, in obvious notation,

µX =∑k∈Z

CkµZ, �X(h) =∑k,l∈Z

Cl�Z(h+ k − l)C′k.

The proof of an analogous result is given by Brockwell and Davis (1991, pp. 83–84) and thearguments used extend straightforwardly to the multivariate setting. When, in this theorem, (Zt ) isa white noise and Ck = 0 for all k < 0, (Xt ) is called a vector moving average process of infiniteorder, VMA(∞). A multivariate extension of Wold’s representation theorem (see Hannan, 1970,pp. 157–158) states that if (Xt ) is a stationary and purely nondeterministic process, it can berepresented as an infinite-order moving average,

Xt =∞∑k=0

Ckεt−k := C(B)εt , C0 = Im, (11.1)

where (εt ) is an (m× 1) white noise, B is the lag operator, C(B) =∑∞k=0 CkB

k , and the matricesCk are not necessarily absolutely summable but satisfy the (weaker) condition

∑∞k=0 ‖Ck‖2 <∞,

for any matrix norm ‖ · ‖. The following definition generalizes the notion of a scalar ARMAprocess to the multivariate case.

Definition 11.2 (VARMA(p, q) process) An Rm-valued process (Xt )t∈Z is called a vectorARMA process of orders p and q (VARMA(p, q)) if (Xt )t∈Z is a stationary solution to thedifference equation

�(B)Xt = c +�(B)εt , (11.2)

where (εt ) is an (m× 1) white noise with covariance matrix �, c is an m× 1 vector, and �(z) =Im −�1z− · · · −�pz

p and �(z) = Im −�1z− · · · −�qzq are matrix-valued polynomials.

Denote by det(A), or more simply |A| when there is no ambiguity, the determinant of a squarematrix A. A sufficient condition for the existence of a stationary and invertible solution to thepreceding equation is

|�(z)||�(z)| = 0, for all z ∈ C such that |z| ≤ 1

(see Brockwell and Davis, 1991, Theorems 11.3.1 and 11.3.2).

Page 291: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 275

When p = 0, the process is called vector moving average of order q (VMA(q)); when q = 0,the process is called vector autoregressive of order p (VAR(p)).

Note that the determinant |�(z)| is a polynomial admitting a finite number of roots z1, . . . , zmp.Let δ = mini |zi |> 1. The power series expansion

�(z)−1 := |�(z)|−1�∗(z) :=∞∑k=0

Ckzk, (11.3)

where A∗ denotes the adjoint of the matrix A (that is, the transpose of the matrix of the cofactorsof A), is well defined for |z| < δ, and is such that �(z)−1�(z) = I . The matrices Ck are recursivelyobtained by

C0 = I, and for k ≥ 1, Ck =min(k,p)∑ =1

Ck− � . (11.4)

11.2 Multivariate GARCH Models

As in the univariate case, we can define multivariate GARCH models by specifying their first twoconditional moments. An Rm-valued GARCH process (εt ), with εt = (ε1t , . . . , εmt )

′, must thensatisfy, for all t ∈ Z,

E(εt | εu, u < t) = 0, Var(εt | εu, u < t) = E(εtε′t | εu, u < t) = Ht . (11.5)

The multivariate extension of the notion of the strong GARCH process is based on an equationof the form

εt = H1/2t ηt , (11.6)

where (ηt ) is a sequence of iid Rm-valued variables with zero mean and identity covariance matrix.The matrix H 1/2

t can be chosen to be symmetric and positive definite1 but it can also be chosen tobe triangular, with positive diagonal elements (see, for instance, Harville, 1997, Theorem 14.5.11).The latter choice may be of interest because if, for instance, H 1/2

t is chosen to be lower triangular,the first component of εt only depends on the first component of ηt . When m = 2, we can thus set

ε1t = h1/211,tη1t ,

ε2t = h12,t

h1/211,t

η1t +(h11,t h22,t−h2

12,th11,t

)1/2

η2t ,(11.7)

where ηit and hij,t denote the generic elements of ηt and Ht .Note that any square integrable solution (εt ) of (11.6) is a martingale difference satis-

fying (11.5).Choosing a specification for Ht is obviously more delicate than in the univariate framework

because: (i) Ht should be (almost surely) symmetric, and positive definite for all t ; (ii) the spec-ification should be simple enough to be amenable to probabilistic study (existence of solutions,stationarity, . . . ), while being of sufficient generality; (iii) the specification should be parsimoniousenough to enable feasible estimation. However, the model should not be too simple to be able tocapture the – possibly sophisticated – dynamics in the covariance structure.

1 The choice is then unique because to any positive definite matrix A, one can associate a unique positivedefinite matrix R such that A = R2 (see Harville, 1997, Theorem 21.9.1). We have R = P!1/2P ′, where !1/2

is a diagonal matrix, with diagonal elements the square roots of the eigenvalues of A, and P is the orthogonalmatrix of the corresponding eigenvectors.

Page 292: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

276 GARCH MODELS

Moreover, it may be useful to have the so-called stability by aggregation property. If εt satisfies(11.5), the process (εt ) defined by εt = Pεt , where P is an invertible square matrix, is such that

E(εt | εu, u < t) = 0, Var(εt | εu, u < t) = Ht = PHtP′. (11.8)

The stability by aggregation of a class of specifications for Ht requires that the conditional variancematrices Ht belong to the same class for any choice of P . This property is particularly relevant infinance because if the components of the vector εt are asset returns, εt is a vector of portfolios ofthe same assets, each of its components consisting of amounts (coefficients of the correspondingrow of P ) of the initial assets.

11.2.1 Diagonal ModelA popular specification, known as the diagonal representation , is obtained by assuming that eachelement hk ,t of the covariance matrix Ht is formulated in terms only of the product of the priork and returns. Specifically,

hk ,t = ωk +q∑i=1

a(i)k εk,t−iε ,t−i +

p∑j=1

b(j)

k hk ,t−j ,

with ωk = ω k , a(i)k = a(i) k , b(j)k = b

(j)

k for all (k, ). For m = 1 this model coincides with theusual univariate formulation. When m> 1 the model obviously has a large number of parametersand will not in general produce positive definite covariance matrices Ht . We have

Ht =

ω11 . . . ω1m...

...

ω1m . . . ωmm

+ q∑i=1

a(i)11 ε

21,t−i . . . a

(i)1mε1,t−iεm,t−i

......

a(i)1mε1,t−iεm,t−i . . . a

(i)mmε

2m,t−i

+p∑j=1

b(j)

11 h11,t−i . . . b(j)

1mh1m,t−i...

...

b(j)

1mh1m,t−i . . . b(j)mmh

2mm,t−i

:= �+

q∑i=1

diag(εt−i )A(i)diag(εt−i )+p∑j=1

B(j) �Ht−j

where � denotes the Hadamard product, that is, the element by element product.2 Thus, in theARCH case (p = 0), sufficient positivity conditions are that � is positive definite and the A(i) arepositive semi-definite, but these constraints do not easily generalize to the GARCH case. We shallgive further positivity conditions obtained by expressing the model in a different way, viewing itas a particular case of a more general class.

It is easy to see that the model is not stable by aggregation: for instance, the conditionalvariance of ε1,t + ε2,t can in general be expressed as a function of the ε2

1,t−i and ε22,t−i , but not

of the (ε1,t−i + ε2,t−i)2. A final drawback of this model is that there is no interaction between thedifferent components of the conditional covariance, which appears unrealistic for applications tofinancial series.

In what follows we present the main specifications introduced in the literature, before turningto the existence of solutions. Let η denote a probability distribution on Rm, with zero mean andunit covariance matrix.

2 For two matrices A = (aij ) and B = (bij ) of the same dimension, A� B = (aij bij ).

Page 293: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 277

11.2.2 Vector GARCH ModelThe vector GARCH (VEC-GARCH) model is the most direct generalization of univariate GARCH:every conditional covariance is a function of lagged conditional variances as well as lagged cross-products of all components. In some sense, everything is explained by everything, which makesthis model very general but also not very parsimonious.

Denote by vech(·) the operator that stacks the columns of the lower triangular part of itsargument square matrix (if A = (aij ), then vech(A) = (a11, a21, . . . , am1, a22, . . . , am2, . . . , amm)

′).The next definition is a natural extension of the standard GARCH(p, q) specification.

Definition 11.3 (VEC-GARCH(p, q) process) Let (ηt ) be a sequence of iid variables with dis-tribution η. The process (εt ) is said to admit a VEC-GARCH(p, q) representation (relative to thesequence (ηt )) if it satisfies

εt = H1/2t ηt , where Ht is positive definite such that

vech(Ht ) = ω +q∑i=1

A(i)vech(εt−iε′t−i )+p∑j=1

B(j)vech(Ht−j )(11.9)

where ω is a vector of size {m(m+ 1)/2} × 1, and the A(i) and B(j) are matrices of dimensionm(m+ 1)/2×m(m+ 1)/2.

Remark 11.1 (The diagonal model is a special case of the VEC-GARCH model) The diag-onal model admits a vector representation, obtained for diagonal matrices A(i) and B(j).

We will show that the class of VEC-GARCH models is stable by aggregation. Recall that thevec(·) operator converts any matrix to a vector by stacking all the columns of the matrix into onevector. It is related to the vech operator by the formulas

vec(A) = DmvechA, vech(A) = D+mvecA, (11.10)

where A is any m×m symmetric matrix, Dm is a full-rank m2 ×m(m+ 1)/2 matrix (the so-called ‘duplication matrix’), whose entries are only 0 and 1, D+m = (D′mDm)

−1D′m.3 We also havethe relation

vec(ABC) = (C′ ⊗ A)vec(B), (11.11)

where ⊗ denotes the Kronecker matrix product,4 provided the product ABC is well defined.

Theorem 11.2 (The VEC-GARCH is stable by aggregation) Let (εt ) be a VEC-GARCH(p, q)process. Then, for any invertible m×m matrix P , the process εt = Pεt is a VEC-GARCH(p, q) process.

3 For instance,

D1 = (1), D2 =

1 0 00 1 00 1 00 0 1

, D+2 =1 0 0 0

0 1/2 1/2 00 0 0 1

.

More generally, for i ≥ j , the [(j − 1)m+ i]th and [(i − 1)m+ j ]th rows of Dm equal the m(m+ 1)/2-dimensional row vector all of whose entries are null, with the exception of the [(j − 1)(m− j/2)+ i]th,equal to 1.

4 If A = (aij ) is an m× n matrix and B is an m′ × n′ matrix, A⊗ B is the mm′ × nn′ matrix admittingthe block elements aijB.

Page 294: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

278 GARCH MODELS

Proof. Setting Ht = PHtP′, we have εt = H

1/2t ηt and

vech(Ht ) = D+m(P ⊗ P)Dmvech(Ht )

= ω +q∑i=1

A(i)vech(εt−i ε′t−i )+p∑j=1

B(j)vech(Ht−j ),

whereω = D+m(P ⊗ P)Dmω,

A(i) = D+m(P ⊗ P)DmA(i)D+m(P

−1 ⊗ P−1)Dm,

B(i) = D+m(P ⊗ P)DmB(i)D+m(P

−1 ⊗ P−1)Dm.

To derive the form of A(i) we use

vec(εt ε′t ) = vec(P−1εt ε′tP−1′) = (P−1 ⊗ P−1)Dmvech(εt ε′t ),

and for B(i) we use

vec(Ht ) = vec(P−1HtP−1′) = (P−1 ⊗ P−1)Dmvech(Ht ).

Positivity Conditions

We now seek conditions ensuring the positivity of Ht . A generic element of

ht = vech(Ht )

is denoted by hk ,t (k ≥ ) and we will denote by a(i)

k ,k′ ′ (b(j)k ,k′ ′ ) the entry of A(i) (B(j)) located

on the same row as hk ,t and belonging to the same column as the element hk′ ′,t of h′t . We thushave an expression of the form

hk ,t = Cov(εkt , ε t | εu, u < t)

= ωk +q∑i=1

m∑k′, ′=1k′≥ ′

a(i)

k ,k′ ′εk′,t−i ε ′,t−i +p∑j=1

m∑k′, ′=1k′≥ ′

b(j)

k ,k′ ′hk′ ′,t−j .

Denoting by A(i)k the m×m symmetric matrix with (k′, ′)th entries a(i)

k ,k′ ′/2, for k′ = ′, and the

elements a(i)k ,k′k′ on the diagonal, the preceding equality is written as

hk ,t = ωk +q∑i=1

ε′t−iA(i)k εt−i +

p∑j=1

m∑k′, ′=1k′≥ ′

b(j)

k ,k′ ′hk′ ′,t−j . (11.12)

In order to obtain a more compact form for the last part of this expression, let us introduce thespectral decomposition of the symmetric matrices Ht , assumed to be positive semi-definite. Wehave Ht =

∑mr=1 λ

(r)t v

(r)t v

(r)t

′, where Vt = (v

(1)t , . . . , v

(m)t ) is an orthogonal matrix of eigenvectors

v(r)t associated with the (positive) eigenvalues λ(r)t of Ht . Defining the matrices B(j)

k by analogywith the A(i)

k , we get

hk ,t = ωk +q∑i=1

ε′t−iA(i)k εt−i +

p∑j=1

m∑r=1

λ(r)t−j v

(r)t−j

′B(j)

k v(r)t−j . (11.13)

Page 295: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 279

Finally, consider the m2 ×m2 matrix admitting the block form Ai = (A(i)k ), and let Bj = (B

(j)

k ).The preceding expressions are equivalent to

Ht =m∑r=1

λ(r)t v

(r)t v

(r)t

= �+q∑i=1

(Im ⊗ ε′t−i )Ai(Im ⊗ εt−i )

+p∑

j=1

m∑r=1

λ(r)t−j (Im ⊗ v

(r)t−j

′)Bj (Im ⊗ v

(r)t−j ), (11.14)

where � is the symmetric matrix such that vech(�) = ω.In this form, it is evident that the assumption

Ai and Bj are positive semi-definite and � is positive definite (11.15)

ensures that if the Ht−j are almost surely positive definite, then so is Ht .

Example 11.1 (Three representations of a vector ARCH(1) model) For p = 0, q = 1 andm = 2, the conditional variance is written, in the form (11.9), as

vech(Ht ) = h11,t

h12,t

h22,t

= ω11

ω12

ω22

+ a11,11 a11,12 a11,22

a12,11 a12,12 a12,22

a22,11 a22,12 a22,22

ε21,t−1

ε1,t−1ε2,t−1

ε22,t−1

,in the form (11.12) as

hk ,t = ωk + (ε1,t−1, ε2,t−1)

[ak ,11

ak ,122

ak ,122 ak ,22

][ε1,t−1

ε2,t−1

], k, = 1, 2,

and in the form (11.14) as

Ht =[ω11 ω12

ω12 ω22

]+ (I2 ⊗ ε′t−1)

a11,11

a11,122 a12,11

a12,122

a11,122 a11,22

a12,122 a12,22

a12,11a12,12

2 a22,11a22,12

2

a12,122 a12,22

a22,122 a22,22

(I2 ⊗ εt−1).

This example shows that, even for small orders, the VEC model potentially has an enormousnumber of parameters, which can make estimation of the parameters computationally demanding.Moreover, the positivity conditions are not directly obtained from (11.9) but from (11.14), involvingthe spectral decomposition of the matrices Ht−j .

The following classes provide more parsimonious and tractable models.

11.2.3 Constant Conditional Correlations ModelsSuppose that, for a multivariate GARCH process of the form (11.6), all the past information onεkt , involving all the variables ε ,t−i , is summarized in the variable hkk,t , with Ehkk,t = Eε2

kt .Then, letting ηkt = h

−1/2kk,t εkt , we define for all k a sequence of iid variables with zero mean

Page 296: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

280 GARCH MODELS

and unit variance. The variables ηkt are generally correlated, so let R = Var(ηt ) = (ρk ), whereηt = (η1t , . . . , ηmt )

′. The conditional variance of

εt = diag(h1/211,t , . . . , h

1/2mm,t )ηt

is then written as

Ht = diag(h1/211,t , . . . , h

1/2mm,t )R diag(h1/2

11,t , . . . , h1/2mm,t ). (11.16)

By construction, the conditional correlations between the components of εt are time-invariant:

hk ,t

h1/2kk,th

1/2 ,t

= E(εkt ε t | εu, u < t)

{E(ε2kt | εu, u < t)E(ε2

t | εu, u < t)}1/2= ρkl.

To complete the specification, the dynamics of the conditional variances hkk,t has to be defined.The simplest constant conditional correlations (CCC) model relies on the following univariateGARCH specifications:

hkk,t = ωk +∑q

i=1 ak,iε2k,t−i +

∑p

j=1 bk,jhkk,t−j , k = 1, . . . , m, (11.17)

where ωk > 0, ak,i ≥ 0, bk,j ≥ 0, −1 ≤ ρk ≤ 1, ρkk = 1, and R is symmetric and positive semi-definite. Observe that the conditional variances are specified as in the diagonal model. Theconditional covariances clearly are not linear in the squares and cross products of the returns.

In a multivariate framework, it seems natural to extend the specification (11.17) by allowinghkk,t to depend not only on its own past, but also on the past of all the variables ε ,t . Set

ht =

h11,t...

hmm,t

, Dt =

√h11,t 0 . . . 0

0. . .

.... . .

0 . . .√hmm,t

, εt =

ε21t...

ε2mt

.

Definition 11.4 (CCC-GARCH(p, q) process) Let (ηt ) be a sequence of iid variables with dis-tribution η. A process (εt ) is called CCC-GARCH(p, q) if it satisfies

εt = H1/2t ηt ,

Ht = DtRDt

ht = ω +q∑i=1

Aiεt−i +p∑j=1

Bj ht−j ,

(11.18)

where R is a correlation matrix, ω is a m× 1 vector with positive coefficients, and the Ai and Bj

are m×m matrices with nonnegative coefficients.

We have εt = Dt ηt , where ηt = R1/2ηt is a centered vector with covariance matrix R. Thecomponents of εt thus have the usual expression, εkt = h

1/2kk,t ηkt , but the conditional variance hkk,t

depends on the past of all the components of εt .Note that the conditional covariances are generally nonlinear functions of the components of

εt−i ε′t−i and of past values of the components of Ht . Model (11.18) is thus not a VEC-GARCHmodel, defined by (11.9), except when R is the identity matrix.

Page 297: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 281

One advantage of this specification is that a simple condition ensuring the positive definitenessof Ht is obtained through the positive coefficients for the matrices Ai and Bj and the choice ofa positive definite matrix for R. We shall also see that the study of the stationarity is remarkablysimple.

Two limitations of the CCC model are, however, (i) its nonstability by aggregation and (ii) thearbitrary nature of the assumption of constant conditional correlations.

11.2.4 Dynamic Conditional Correlations ModelsDynamic conditional correlations GARCH (DCC-GARCH) models are an extension of CCC-GARCH, obtained by introducing a dynamic for the conditional correlation. Hence, the constantmatrix R in Definition 11.4 is replaced by a matrix Rt which is measurable with respect to the pastvariables {εu, u < t}. For reasons of parsimony, it seems reasonable to choose diagonal matrices Ai

and Bi in (11.18), corresponding to univariate GARCH models for each component as in (11.17).Different DCC models are obtained depending on the specification of Rt . A simple formulation is

Rt = θ1R + θ2�t−1 + θ3Rt−1, (11.19)

where the θi are positive weights summing to 1, R is a constant correlation matrix, and �t−1 isthe empirical correlation matrix of εt−1, . . . , εt−M . The matrix Rt is thus a correlation matrix (seeExercise 11.9). Equation (11.19) is reminiscent of the GARCH(1, 1) specification, θ1R playing therole of the parameter ω, θ2 that of α, and θ3 that of β.

Another way of specifying the dynamics of Rt is by setting

Rt = (diagQt)−1/2Qt(diagQt)

−1/2,

where diagQt is the diagonal matrix constructed with the diagonal elements of Qt , and Qt isa sequence of covariance matrices which is measurable with respect to σ (εu, u < t). A naturalparameterization is

Qt = θ1Q+ θ2εt−1ε′t−1 + θ3Qt−1, (11.20)

where Q is a covariance matrix. Again, the formulation recalls the GARCH(1, 1) model. Thoughdifferent, both specifications (11.19) and (11.20) allow us to test the assumption of constant condi-tional covariance matrix, by considering the restriction θ2 = θ3 = 0. Note that the same θ2 and θ3

coefficients appear in the different conditional correlations, which thus have very similar dynamics.The matrices R and Q are often estimated/replaced by the empirical correlation and covariancematrices. In this approach a DCC model of the form (11.19) or (11.20) thus introduces only twomore parameters than the CCC formulation.

11.2.5 BEKK-GARCH ModelThe BEKK acronym refers to a specific parameterization of the multivariate GARCH model devel-oped by Baba, Engle, Kraft and Kroner, in a preliminary version of Engle and Kroner (1995).

Definition 11.5 (BEKK-GARCH(p, q) process) Let (ηt ) denote an iid sequence with commondistribution η. The process (εt ) is called a strong GARCH(p, q), with respect to the sequence (ηt ),if it satisfies

εt = H1/2t ηt

Ht = �+q∑i=1

K∑k=1

Aikεt−iε′t−iA′ik +

p∑j=1

K∑k=1

BjkHt−jB ′jk,

where K is an integer, �, Aik and Bjk are square m×m matrices, and � is positive definite.

Page 298: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

282 GARCH MODELS

The specification obviously ensures that if the matrices Ht−i , i = 1, . . . , p, are almost surelypositive definite, then so is Ht .

To compare this model with the representation (11.9), let us derive the vector form of theequation for Ht . Using the relations (11.10) and (11.11), we get

vech(Ht ) = vech(�)+q∑i=1

D+mK∑k=1

(Aik ⊗ Aik)Dmvech(εt−i ε′t−i )

+p∑j=1

D+mK∑k=1

(Bjk ⊗ Bjk)DmvechHt−j .

The model can thus be written in the form (11.9), with

A(i) = D+mK∑k=1

(Aik ⊗ Aik)Dm, B(j) = D+mK∑k=1

(Bjk ⊗ Bjk)Dm, (11.21)

for i = 1, . . . , q and j = 1, . . . , p. In particular, it can be seen that the number of coefficients ofa matrix A(i) in (11.9) is [m(m+ 1)/2]2, whereas it is Km2 in this particular case.

The BEKK class contains (Exercise 11.13) the diagonal models obtained by choosing diagonalmatrices Aik and Bjk . The following theorem establishes a converse to this property.

Theorem 11.3 For the model defined by the diagonal vector representation (11.9) with

A(i) = diag{vech

(A(i)

)}, B(j) = diag

{vech

(B(j)

)},

where A(i) = (a(i)

′) and B(j) = (b(j)

′ ) are m×m symmetric positive semi-definite matrices, thereexist matrices Aik and Bjk such that (11.21) holds, for K = m.

Proof. There exists an upper triangular matrix

D(i) =

d(i)11,m d

(i)11,m−1 . . . d

(i)11,1

0 d(i)22,m−1 . . . d

(i)22,1

......

0 0 . . . d(i)mm,1

such that A(i) = D(i){D(i)}′. Let Aik = diag(d(i)11,k, d

(i)22,k, . . . , d

(i)rr,k, 0, . . . , 0) where r = m− k + 1,

for k = 1, . . . , m. It is easy to show that the first equality in (11.21) is satisfied with K = m. Thesecond equality is obtained similarly. �

Example 11.2 By way of illustration, consider the particular case where m = 2, q = K = 1 andp = 0. If A = (aij ) is a 2× 2 matrix, it is easy to see that

D+2 (A⊗ A)D2 = a2

11 2a11a12 a212

a11a21 a11a22 + a12a21 a12a22

a221 2a21a22 a2

22

.Hence, canceling out the unnecessary indices,

h11,t = ω11 + a211ε

21,t−1 + 2a11a12ε1,t−1ε2,t−1 + a2

12ε22,t−1

h12,t = ω12 + a11a21ε21,t−1 + (a11a22 + a12a21)ε1,t−1ε2,t−1 + a12a22ε

22,t−1

h22,t = ω22 + a221ε

21,t−1 + 2a21a22ε1,t−1ε2,t−1 + a2

22ε22,t−1.

Page 299: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 283

In particular, the diagonal models belonging to this class are of the formh11,t = ω11 + a2

11ε21,t−1

h12,t = ω12 + a11a22ε1,t−1ε2,t−1

h22,t = ω22 + a222ε

22,t−1.

Remark 11.2 (Interpretation of the BEKK coefficients) Example 11.2 shows that the BEKKspecification imposes highly artificial constraints on the volatilities and covolatilities of the com-ponents. As a consequence, the coefficients of a BEKK representation are difficult to interpret.

Remark 11.3 (Identifiability) Identifiability of a BEKK representation requires additional con-straints. Indeed, the same representation holds if Aik is replaced by −Aik , or if the matricesA1k, . . . , Aqk and A1k′ , . . . , Aqk′ are permuted for k = k′.

Example 11.3 (A general and identifiable BEKK representation) Consider the case m = 2,q = 1 and p = 0. Suppose that the distribution η is nondegenerate, so that there exists nonontrivial constant linear combination of a finite number of the εk,t−i ε ,t−i . Let

Ht = �+m2∑k=1

Akεt−1ε′t−1A

′k,

where � is a symetric positive definite matrix,

A1 =(a11,1 a12,1

a21,1 a22,1

), A2 =

(0 0

a21,2 a22,2

),

A3 =(

0 a12,3

0 a22,3

), A4 =

(0 00 a22,4

),

with a11,1 ≥ 0, a12,3 ≥ 0, a21,2 ≥ 0 and a22,4 ≥ 0.Let us show that this BEKK representation is both identifiable and quite general. Easy, but

tedious, computation shows that an expression of the form (11.9) holds with

A(1) =4∑

k=1

D+2 (Ak ⊗ Ak)D2

= a2

11,1 2a11,1a12,1 a212,1 + a2

12,3a11,1a21,1 a12,1a21,1 + a11,1a22,1 a12,1a22,1 + a12,3a22,3

a221,1 + a2

21,2 2a21,1a22,1 + 2a21,2a22,2 a222,1 + a2

22,2 + a222,3 + a2

22,4

.

In view of the sign constraint, the (1, 1)th element of A(1) allows us to identify a11,1. The (1, 2)thand (2, 1)th elements then allow us to find a12,1 and a21,1, whence the (2, 2)th element yields a22,1.The two elements of A3 are deduced from the (1, 3)th and (2, 3)th elements of A(1), and from theconstraint a12,3 ≥ 0 (which could be replaced by a constraint on the sign of a22,3). A2 is identifiedsimilarly, and the nonzero element of A4 is finally identified by considering the (3, 3)th elementof A(1).

In this example, the BEKK representation contains the same number of parameters as thecorresponding VEC representation, but has the advantage of automatically providing a positivedefinite solution Ht .

It is interesting to consider the stability by aggregation of the BEKK class.

Page 300: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

284 GARCH MODELS

Theorem 11.4 (Stability of the BEKK model by aggregation) Let (εt ) be a BEKK-GARCH(p, q) process. Then, for any invertible m×m matrix P , the process εt = Pεt is a BEKK-GARCH(p, q) process.

Proof. Letting Ht = PHtP′, � = P�P ′, Aik = PAikP

−1, and Bjk = PBjkP−1, we get

εt = H1/2t ηt , Ht = �+

q∑i=1

K∑k=1

Aikεt−i ε′t−i A′ik +

p∑j=1

K∑k=1

BjkHt−i B ′jk

and, � being a positive definite matrix, the result is proved. �

As in the univariate case, the ‘square’ of the (εt ) process is the solution of an ARMA model.Indeed, define the innovation of the process vec(εt ε′t ):

νt = vec(εt ε′t )− vec[E(εtε′t |εu, u < t)] = vec(εt ε′t )− vec(Ht ). (11.22)

Applying the vec operator, and substituting the variables vec(Ht−j ) in the model of Definition11.5 by vec(εt−j ε′t−j )− νt−j , we get the representation

vec(εt ε′t ) = vec(�)+

r∑i=1

K∑k=1

{(Aik + Bik)⊗ (Aik + Bik)} vec(εt−i ε′t−i )

+νt −p∑j=1

K∑k=1

(Bjk ⊗ Bjk)νt−j , t ∈ Z, (11.23)

where r = max(p, q), with the convention Aik = 0 (Bjk = 0) if i > q (j >p). This representationcannot be used to obtain stationarity conditions because the process (νt ) is not iid in general.However, it can be used to derive the second-order moment, when it exists, of the process εt as

E{vec(εt ε′t )} = vec(�)+

r∑i=1

K∑k=1

{(Aik + Bik)⊗ (Aik + Bik)}E{vec(εt ε′t )},

that is,

E{vec(εt ε′t )} =

{I −

r∑i=1

K∑k=1

(Aik + Bik)⊗ (Aik + Bik)

}−1

vec(�),

provided that the matrix in braces is nonsingular.

11.2.6 Factor GARCH ModelsIn these models, it is assumed that a nonsingular linear combination ft of the m componentsof εt , or an exogenous variable summarizing the comovements of the components, has aGARCH structure.

Factor models with idiosyncratic noise

A very popular factor model links individual returns εit to the market return ft through a regres-sion model

εit = βift + ηit , i = 1, . . . , m. (11.24)

The parameter βi can be interpreted as a sensitivity to the factor, and the noise ηit as a specificrisk (often called idiosyncratic risk) which is conditionally uncorrelated with ft . It follows thatHt = �+ λtββ ′ where β is the vector of sensitivities, λt is the conditional variance of ft and �

Page 301: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 285

is the covariance matrix of the idiosyncratic terms. More generally, assuming the existence of rconditionally uncorrelated factors, we obtain the decomposition

Ht = �+r∑

j=1

λjtβjβ′j . (11.25)

It is not restrictive to assume that the factors are linear combinations of the components of εt(Exercise 11.10). If, in addition, the conditional variances λjt are specified as univariate GARCH,the model remains parsimonious in terms of unknown parameters and (11.25) reduces to a par-ticular BEKK model (Exercise 11.11). If � is chosen to be positive definite and if the univariateseries (λjt )t , j = 1, . . . , r , are independent, strictly and second-order stationary, then it is clearthat (11.25) defines a sequence of positive definite matrices (Ht ) that are strictly and second-order stationary.

Principal components GARCH model

The concept of factor is central to principal components analysis (PCA) and to other methods ofexploratory data analysis. PCA relies on decomposing the covariance matrix V of m quantitativevariables as V = P!P ′, where ! is a diagonal matrix whose elements are the eigenvalues λ1 ≥λ2 ≥ · · · ≥ λm of V , and where P is the orthonormal matrix of the corresponding eigenvectors.The first principal component is the linear combination of the m variables, with weights given bythe first column of P , which, in some sense, is the factor which best summarizes the set of mvariables (Exercise 11.12). There exist m principal components, which are uncorrelated and whosevariances λ1, . . . , λm (and hence whose explanatory powers) are in decreasing order. It is naturalconsidering this method for extracting the key factors of the volatilities of the m componentsof εt .

We obtain a principal component GARCH (PC-GARCH) or orthogonal GARCH (O-GARCH)model by assuming that

Ht = P!tP′, (11.26)

where P is an orthogonal matrix (P ′ = P−1) and !t = diag (λ1t , . . . , λmt ), where the λit are thevolatilities, which can be obtained from univariate GARCH-type models. This is equivalent toassuming

εt = P ft , (11.27)

where ft = P ′εt is the principal component vector, whose components are orthogonal factors. Ifunivariate GARCH(1, 1) models are used for the factors fit =

∑mj=1 P(j, i)εjt , then

λit = ωi + αif2it−1 + βiλit−1. (11.28)

Remark 11.4 (Interpretation, factor estimation and extensions)

1. Model (11.26) can also be interpreted as a full-factor GARCH (FF-GARCH) model, thatis, a model with as many factors as components and no idiosyncratic term. Let P(·, j) bethe j th column of P (an eigenvector of Ht associated with the eigenvalue λjt ). We get aspectral expression for the conditional variance,

Ht =m∑j=1

λjtP (·, j)P ′(·, j),

which is of the form (11.25) with an idiosyncratic variance � = 0.

2. A PCA of the conditional variance Ht should, in full generality, give Ht = Pt!tP′t with

factors (that is, principal components) ft = P ′t εt . Model (11.26) thus assumes that all factorsare linear combinations, with fixed coefficients, of the same returns εit . For instance, the first

Page 302: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

286 GARCH MODELS

factor f1t is the conditionally most risky factor (with the largest conditional variance λ1t , seeExercise 11.12). But since it is assumed that the direction of f1t is fixed, in the subspace ofRm generated by the components of εit , the first factor is also the most risky unconditionally.This can be seen through the PCA of the unconditional variance H = EHt = P!P ′, whichis assumed to exist.

3. It is easy to estimate P by applying PCA to the empirical variance H = n−1∑nt=1(εt −

ε)(εt − ε)′, where ε = n−1∑nt=1 εt . The components of P ′εt are specified as GARCH-

type univariate models. Estimation of the conditional variance Ht = P !t P′ thus reduces

to estimating m univariate models.

4. It is common practice to apply PCA on centered and standardized data, in order to removethe influence of the units of the various variables. For returns εit , standardization does notseem appropriate if one wishes to retain a size effect, that is, if one expects an asset witha relatively large variance to have more weight in the riskier factors.

5. In the spirit of the standard PCA, it is possible to only consider the first r principal com-ponents, which are the key factors of the system. The variance Ht is thus approximated by

Pdiag(λ1t , . . . , λrt , 0′m−r )P′, (11.29)

where the λit are estimated from simple univariate models, such as GARCH(1, 1) modelsof the form (11.28), the matrix P is obtained from PCA of the empirical covariance matrixH = Pdiag(λ1, . . . , λm)P

′, and the factors are approximated by ft = P ′εt . Instead of theapproximation (11.29), one can use

Ht = Pdiag(λ1t , . . . , λrt , λr+1, . . . , λm)P′. (11.30)

The approximation in (11.30) is as simple as (11.29) and does not require additional com-putations (in particular, the r GARCH equations are retained) but has the advantage ofproviding an almost surely invertible estimation of Ht (for fixed n), which is required inthe computation of certain statistics (such as the AIC-type information criteria based on theGaussian log-likelihood).

6. Note that the assumption that P is orthogonal can be restrictive. The class of gen-eralized orthogonal GARCH (GO-GARCH) processes assumes only that P is anynonsingular matrix.

11.3 Stationarity

In this section, we will first discuss the difficulty of establishing stationarity conditions, or theexistence of moments, for multivariate GARCH models. For the general vector model (11.9), and inparticular for the BEKK model, there exist sufficient stationarity conditions. The stationary solutionbeing nonexplicit, we propose an algorithm that converges, under certain assumptions, to thestationary solution. We will then see that the problem is much simpler for the CCC model (11.18).

11.3.1 Stationarity of VEC and BEKK ModelsIt is not possible to provide stationary solutions, in explicit form, for the general VEC model (11.9).To illustrate the difficulty, recall that a univariate ARCH(1) model admits a solution εt = σtηt withσt explicitly given as a function of {ηt−u, u> 0} as the square root of

σ 2t = ω + αη2

t σ2t−1 = ω

{1+ αη2

t + α2η2t η

2t−1 + · · ·

},

Page 303: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 287

provided that the series converges almost surely. Now consider a bivariate model of the form(11.6) with Ht = I2 + αεt−1ε

′t−1, where α is assumed, for the sake of simplicity, to be scalar and

positive. Also choose H 1/2t to be lower triangular so as to have (11.7). Then

h11,t = 1+ αh11,t−1η21,t−1

h12,t = αh12,t−1η21,t−1 + α

(h11,t−1h22,t−1 − h2

12,t−1

)1/2η1,t−1η2,t−1

h22,t = 1+ αh2

12,t−1h11,t−1

η21,t−1 + α

h11,t−1h22,t−1−h212,t−1

h11,t−1η2

2,t−1

+2αh12,t−1

(h11,t−1h22,t−1−h2

12,t−1

)1/2

h11,t−1η1,t−1η2,t−1.

It can be seen that, given ηt−1, the relationship between h11,t and h11,t−1 is linear, and can beiterated to yield

h11,t = 1+∞∑i=1

αii∏

j=1

η21,t−j

under the constraint α < exp(−E log η21t ). In contrast, the relationships between h12,t , or h22,t , and

the components of Ht−1 are not linear, which makes it impossible to express h12,t and h22,t as asimple function of α, {ηt−1, ηt−2, . . . , ηt−k} and Ht−k for k ≥ 1. This constitutes a major obstaclefor determining sufficient stationarity conditions.

Remark 11.5 (Stationarity does not follow from the ARMA model) Similar to (11.22), let-ting νt = vech(εt ε′t )− vech(Ht ), we obtain the ARMA representation

vech(εt ε′t ) = ω +

r∑i=1

C(i)vech(εt−i ε′t−i )+ νt −p∑j=1

B(j)νt−j ,

by setting C(i) = A(i) + B(i) and by using the usual notation and conventions. In the literature,one may encounter the argument that the model is weakly stationary if the polynomial z �→det(Is −

∑ri=1 C

(i)zi)

has all its roots outside the unit circle (s = m(m+ 1)/2). Although theresult is certainly true with additional assumptions on the noise density (see Theorem 11.5 and thesubsequent discussion), the argument is not correct since

vech(εt ε′t ) =

(r∑i=1

C(i)Bi

)−1ω + νt −

p∑j=1

B(j)νt−j

constitutes a solution only if νt = vech(εt ε′t )− vech(Ht ) can be expressed as a function of{ηt−u, u> 0}.

Boussama (2006) obtained the following stationarity condition. Recall that ρ(A) denotes the spec-tral radius of a square matrix A.

Theorem 11.5 (Stationarity and ergodicity) There exists a strictly stationary and nonanticipa-tive solution of the vector GARCH model (11.9), if:

(i) the positivity condition (11.15) is satisfied;

(ii) the distribution of η has a density, positive on a neighborhood of 0, with respect to theLebesgue measure on Rm;

(iii) ρ(∑r

i=1 C(i))< 1.

This solution is unique, β-mixing and ergodic.

Page 304: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

288 GARCH MODELS

In the particular case of the BEKK model (11.21), condition (iii) takes the form

ρ

(D+m

K∑k=1

r∑i=1

(Aik ⊗ Aik + Bik ⊗ Bik)Dm

)< 1.

The proof of Theorem 11.5 relies on sophisticated algebraic tools. Assumption (ii) is a standardtechnical condition for showing the β-mixing property (but is of no use for stationarity). Note thatcondition (iii), written as

∑ri=1 αi + βi < 1 in the univariate case, is generally not necessary for

the strict stationarity.This theorem does not provide explicit stationary solutions, that is, a relationship between εt

and the ηt−i . However, it is possible to construct an algorithm which, when it converges, allowsa stationary solution to the vector GARCH model (11.9) to be defined.

Construction of a stationary solution

For any t, k ∈ Z, we defineε(k)t = H

(k)t = 0, when k < 0,

and, recursively on k ≥ 0,

vech(H(k)t ) = ω +

q∑i=1

A(i)vech(ε(k−i)t−i ε(k−i)′t−i )+

p∑j=1

B(j)vech(H(k−j)t−j ), (11.31)

with ε(k)t = H

(k) 1/2t ηt .

Observe that, for k ≥ 1,

H(k)t = fk(ηt−1, . . . , ηt−k) and EH

(k)t = H(k),

where fk is a measurable function and H(k) is a square matrix. (H (k) 1/2t )t and (ε

(k)t )t are thus

stationary processes whose components take values in the Banach space L2 of the (equivalenceclasses of) square integrable random variables. It is then clear that (11.9) admits a strictly stationarysolution, which is nonanticipative and ergodic, if, for all t ,

H(k)t converges almost surely when k→∞. (11.32)

Indeed, letting H1/2t = limk→∞H

(k) 1/2t and εt = H

1/2t ηt , and taking the limit of each side of

(11.31), we note that (11.9) is satisfied. Moreover, (εt ) constitutes a strictly stationary and nonan-ticipative solution, because εt is a measurable function of {ηu, u ≤ t}. In view of Theorem A.1,such a process is also ergodic. Note also that if Ht exists, it is symmetric and positive definitebecause the matrices H(k)

t are symmetric and satisfy

λ′H(k)t λ ≥ λ′�λ> 0, for λ = 0.

This solution (εt ) is also second-order stationary if

H(k)t converges in L1 when k →∞. (11.33)

Let�(k)t = vech

∣∣∣H(k)t −H

(k−1)t

∣∣∣ .From Exercise 11.8 and its proof, we obtain (11.32), and hence the existence of strictly stationarysolution to the vector GARCH equation (11.9), if there exists ρ ∈]0, 1[ such that ‖�(k)

t ‖ = O(ρk)

almost surely as k→∞, which is equivalent to

limk→∞

1

klog ‖�(k)

t ‖ < 0, a.s. (11.34)

Page 305: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 289

Similarly, we obtain (11.33) if ‖E�(k)t ‖ = O(ρk). The criterion in (11.34) is not very explicit

but the left-hand side of the inequality can be evaluated by simulation, just as for a Lyapunovcoefficient.

11.3.2 Stationarity of the CCC ModelIn model (11.18), letting ηt = R1/2ηt , we get

εt = ϒtht , where ϒt =

η2

1t 0 . . . 0

0. . .

.... . .

0 . . . η2mt

.

Multiplying by ϒt the equation for ht , we thus have

εt = ϒt ω +q∑i=1

ϒtAiεt−i +p∑j=1

ϒtBj ht−j ,

which can be written

zt= bt + Atzt−1, (11.35)

where

bt = b(ηt ) =

ϒtω

0...

ω

0...

0

∈ Rm(p+q), z

t=

εt...

εt−q+1ht...

ht−p+1

∈ Rm(p+q),

and

At =

ϒtA1 · · · ϒtAq ϒtB1 · · · ϒtBp

Im 0 · · · 0 0 · · · 0

0 Im · · · 0 0 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . Im 0 0 . . . 0 0

A1 · · · Aq B1 · · · Bp

0 · · · 0 Im 0 · · · 0

0 · · · 0 0 Im · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 0 0 0 . . . Im 0

(11.36)

is a (p + q)m× (p + q)m matrix.

Page 306: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

290 GARCH MODELS

We obtain a vector representation, analogous to (2.16) obtained in the univariate case. Thisallows to state the following result.

Theorem 11.6 (Strict stationarity of the CCC model) A necessary and sufficient condition forthe existence of a strictly stationary and nonanticipative solution process for model (11.18) is γ < 0,where γ is the top Lyapunov exponent of the sequence {At , t ∈ Z} defined in (11.36). This stationaryand nonanticipative solution, when γ < 0, is unique and ergodic.

Proof. The proof is similar to that of Theorem 2.4. The variables ηt admitting a variance, thecondition E log+ ‖At‖ <∞ is satisfied.

It follows that when γ < 0, the series

zt= bt +

∞∑n=0

AtAt−1 . . . At−nbt−n−1 (11.37)

converges almost surely for all t . A strictly stationary solution to model (11.18) is obtained asεt = {diag(z

q+1,t )}1/2R1/2ηt where zq+1,t denotes the (q + 1)th subvector of size m of z

t. This

solution is thus nonanticipative and ergodic. The proof of the uniqueness is exactly the same as inthe univariate case.

The proof of the necessary part can also be easily adapted. From Lemma 2.1, it is sufficientto prove that limt→∞ ‖A0 . . . A−t‖ = 0. It suffices to show that, for 1 ≤ i ≤ p + q,

limt→∞A0 . . . A−t ei = 0, a.s., (11.38)

where ei = ei ⊗ Im and ei is the ith element of the canonical basis of Rp+q , since any vector xof Rm(p+q) can be uniquely decomposed as x =∑p+q

i=1 eixi , where xi ∈ Rm. As in the univariatecase, the existence of a strictly stationary solution implies that A0 . . . A−kb−k−1 tends to 0, almostsurely, as k→∞. It follows that, using the relation b−k−1 = e1ϒ−k−1ω + eq+1ω, we have

limk→∞

A0 . . . A−ke1ϒ−k−1ω = 0, limk→∞

A0 . . . A−keq+1ω = 0, a.s. (11.39)

Since the components of ω are strictly positive, (11.38) thus holds for i = q + 1. Using

A−keq+i = ϒ−kBi e1 + Bi eq+1 + eq+i+1, i = 1, . . . , p, (11.40)

with the convention that ep+q+1 = 0, for i = 1 we obtain

0 = limt→∞A0 . . . A−keq+1 ≥ lim

k→∞A0 . . . A−k+1eq+2 ≥ 0,

where the inequalities are taken componentwise. Therefore, (11.38) holds true for i = q + 2, and byinduction, for i = q + j, j = 1, . . . , p in view of (11.40). Moreover, since A−keq = ϒ−kAqe1 +Aqeq+1, (11.38) holds for i = q. We reach the same conclusion for the other values of i using anascending recursion, as in the univariate case. �

The following result provides a necessary strict stationarity condition which is simple to check.

Corollary 11.1 (Consequence of the strict stationarity) Let γ denote the top Lyapunov expo-nent of the sequence {At, t ∈ Z} defined in (11.36). Consider the matrix polynomial defined by:B(z) = Im − zB1 − . . .− zpBp , z ∈ C. Let

B =

B1 B2 · · · Bp

Im 0 · · · 00 Im · · · 0...

. . .. . .

...

0 · · · Im 0

.

Page 307: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 291

Then, if γ < 0 the following equivalent properties hold:

1. The roots of detB(z) are outside the unit disk.

2. ρ(B) < 1.

Proof. Because all the entries of the matrices At are positive, it is clear that γ is larger thanthe top Lyapunov exponent of the sequence (A∗t ) obtained by replacing the matrices Ai by 0 inAt . It is easily seen that the top Lyapunov coefficient of (A∗t ) coincides with that of the constantsequence equal to B, that is, with ρ(B). It follows that γ ≥ log ρ(B). Hence γ < 0 entails that allthe eigenvalues of B are outside the unit disk. Finally, in view of Exercise 11.14, the equivalencebetween the two properties follows from

det(B− λImp) = (−1)mpdet{λpIm − λp−1B1 − · · · − λBp−1 − Bp

}= (−λ)mpdetB

(1

λ

), λ = 0.

Corollary 11.2 Suppose that γ < 0. Let εt be the strictly stationary and nonanticipative solutionof model (11.18). There exists s > 0 such that E‖ht‖s <∞ and E‖εt‖2s <∞.

Proof. It is shown in the proof of Corollary 2.3 that the strictly stationary solution definedby (11.37) satisfies E‖z

t‖s <∞ for some s > 0. The conclusion follows from ‖εt‖ ≤ ‖zt‖ and

‖ht‖ ≤ ‖zt‖. �

11.4 Estimation of the CCC Model

We now turn to the estimation of the m-dimensional CCC-GARCH(p, q) model by the quasi-maximum likelihood method. Recall that (εt ) is called a CCC-GARCH(p, q) if it satisfies

εt = H1/2t ηt ,

Ht = DtRDt , D2t = diag(ht ),

ht = ω +q∑i=1

Aiεt−i +p∑

j=1

Bj ht−j , εt =(ε2

1t , · · · , ε2mt

)′,

(11.41)

where R is a correlation matrix, ω is a vector of size m× 1 with strictly positive coefficients, theAi and Bj are matrices of size m×m with positive coefficients, and (ηt ) is an iid sequence ofcentered variables in Rm with identity covariance matrix.

As in the univariate case, the criterion is written as if the iid process were Gaussian.The parameters are the coefficients of the matrices ω, Ai and Bj , and the coefficients of the

lower triangular part (excluding the diagonal) of the correlation matrix R = (ρij ). The number ofunknown parameters is thus

s0 = m+m2(p + q)+ m(m− 1)

2.

The parameter vector is denoted by

θ = (θ1, . . . , θs0)′ = (ω′, α′1, . . . , α

′q, β

′1, . . . , β

′p, ρ

′)′ := (ω′, α′, β ′, ρ′)′,

Page 308: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

292 GARCH MODELS

where ρ ′ = (ρ21, . . . , ρm1, ρ32, . . . , ρm2, . . . , ρm,m−1), αi = vec(Ai ), i = 1, . . . , q, and βj =vec(Bj ), j = 1, . . . , p. The parameter space is a subspace � of

]0,+∞[m×[0,∞[m2(p+q)×]− 1, 1[m(m−1)/2.

The true parameter valued is denoted by

θ0 = (ω′0, α′01, . . . , α

′0q , β

′01, . . . , β

′0p, ρ

′0)′ = (ω′0, α

′0, β

′0, ρ

′0)′.

Before detailing the estimation procedure and its properties, we discuss the conditions that needto be imposed on the matrices Ai and Bj in order to ensure the uniqueness of the parameterization.

11.4.1 Identifiability ConditionsLet Aθ (z) =

∑q

i=1 Aizi and Bθ (z) = Im −

∑p

j=1 Bj zj . By convention, Aθ (z) = 0 if q = 0 and

Bθ (z) = Im if p = 0.If Bθ (z) is nonsingular, that is, if the roots of det (Bθ (z)) = 0 are outside the unit disk, we

deduce from Bθ (B)ht = ω +Aθ (B)εt the representation

ht = Bθ (1)−1ω + Bθ (B)−1Aθ (B)εt . (11.42)

In the vector case, assuming that the polynomials Aθ0 and Bθ0 have no common root is insufficientto ensure that there exists no other pair (Aθ ,Bθ ), with the same degrees (p, q), such that

Bθ (B)−1Aθ (B) = Bθ0 (B)−1Aθ0(B). (11.43)

This condition is equivalent to the existence of an operator U(B) such that

Aθ (B) = U(B)Aθ0(B) and Bθ (B) = U(B)Bθ0 (B), (11.44)

this common factor vanishing in Bθ (B)−1Aθ (B) (Exercise 11.2).The polynomial U(B) is called unimodular if det{U(B)} is a nonzero constant. When the only

common factors of the polynomials P(B) and Q(B) are unimodular, that is, when

P(B) = U(B)P1(B), Q(B) = U(B)Q1(B) �⇒ det{U(B)} = constant,

then P(B) and Q(B) are called left coprime.The following example shows that, in the vector case, assuming that Aθ0(B) and Bθ0(B) are

left coprime is insufficient to ensure that (11.43) has no solution θ = θ0 (in the univariate case thisis sufficient because the condition Bθ (0) = Bθ0 (0) = 1 imposes U(B) = U(0) = 1).

Example 11.4 (Nonidentifiable bivariate model) For m = 2, let

Aθ0(B) =(

a11(B) a12(B)

a21(B) a22(B)

), Bθ0(B) =

(b11(B) b12(B)

b21(B) b22(B)

),

U(B) =(

1 0B 1

),

withdeg(a21) = deg(a22) = q, deg(a11) < q, deg(a12) < q

anddeg(b21) = deg(b22) = p, deg(b11) < p, deg(b12) < p.

Page 309: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 293

The polynomial A(B) = U(B)Aθ0 (B) has the same degree q as Aθ0(B), and B(B) = U(B)Bθ0 (B)

is a polynomial of the same degree p as Bθ0(B). On the other hand, U(B) has a nonzero determinantwhich is independent of B, hence it is unimodular. Moreover, B(0) = Bθ0 (0) = Im and A(0) =Aθ0(0) = 0. It is thus possible to find θ such that B(B) = Bθ (B), A(B) = Aθ (B) and ω = U(1)ω0.The model is thus nonidentifiable, θ and θ0 corresponding to the same representation (11.42).

Identifiability can be ensured by several types of conditions; see Reinsel (1997, pp. 37-40),Hannan (1976) or Hannan and Deistler (1988, sec. 2.7). To obtain a mild condition define, forany column i of the matrix operators Aθ (B) and Bθ (B), the maximal degrees qi(θ) and pi(θ),respectively. Suppose that maximal values are imposed for these orders, that is,

∀θ ∈ �, ∀i = 1, . . . , m, qi(θ) ≤ qi and pi(θ) ≤ pi, (11.45)

where qi ≤ q and pi ≤ p are fixed integers. Denote by aqi (i) (bpi (i)) the column vector of thecoefficients of Bqi (Bpi ) in the ith column of Aθ0(B) (Bθ0 (B)).

Example 11.5 (Illustration of the notation on an example) For

Aθ0(B) =(

1+ a11B2 a12B

a21B2 + a∗21B 1+ a22B

), Bθ0 (B) =

(1+ b11B

4 b12B

b21B4 1+ b22B

),

with a11a21a12a22b11b21b12b22 = 0, we have

q1(θ0) = 2, q2(θ0) = 1, p1(θ0) = 4, p2(θ0) = 1and

a2(1) =(

a11

a21

), a1(2) =

(a12

a22

), b4(1) =

(b11

b21

), b1(2) =

(b12

b22

).

Proposition 11.1 (A simple identifiability condition) If the matrix

M(Aθ0 ,Bθ0 ) = [aq1 (1) · · · aqm(m) bp1 (1) · · · bpm(m)] (11.46)

has full rank m, the parameters α0 and β0 are identified by the constraints (11.45) with qi = qi(θ0)

and pi = pi(θ0) for any value of i.

Proof. From the proof of the theorem in Hannan (1969), U(B) satisfying (11.44) is a unimodularmatrix of the form U(B) = U0 + U1B + . . .+ UkB

k. Since the term of highest degree (columnby column) of Aθ0(B) is [aq1(1)B

q1 · · · aqm(m)Bqm ], the ith column of Aθ (B) = U(B)Aθ0 (B) isa polynomial in B of degree less than qi if and only if Ujaqi (i) = 0, for j = 1, . . . , k. Similarly,we must have Ujbpi (i) = 0, for j = 1, . . . , k and i = 1, . . . m. It follows that UjM(Aθ0 ,Bθ0) = 0,which implies that Uj = 0 for j = 1, . . . , k thanks to condition (11.46). Consequently U(B) = U0

and, since, for all θ , Bθ (0) = Im, we have U(B) = Im. �

Example 11.6 (Illustration of the identifiability condition) In Example 11.4,

M(Aθ0 ,Bθ0 ) = [aq(1)aq(2)bp(1)bp(2)] =[

0 0 0 0× × × ×

]is not a full-rank matrix. Hence, the identifiability condition of Proposition 11.1 is not satisfied.Indeed, the model is not identifiable.

A simpler, but more restrictive, condition is obtained by imposing the requirement that

M1(Aθ0 ,Bθ0 ) = [Aq Bp]

Page 310: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

294 GARCH MODELS

has full rank m. This entails uniqueness under the constraint that the degrees of Aθ and Bθ areless than q and p, respectively.

Example 11.7 (Another illustration of the identifiability condition) Turning again to Example11.5 with a12b21 = a22b11 and, for instance, a21 = 0 and a22 = 0, observe that the matrix

M1(Aθ0 ,Bθ0 ) =[

0 a12 b11 00 a22 b21 0

]does not have full rank, but the matrix

M(Aθ0 ,Bθ0) =[a11 a12 b11 b12

0 a22 b21 b22

]does have full rank.

More restrictive forms, such as the echelon form, are sometimes required to ensure identifiability.

11.4.2 Asymptotic Properties of the QMLEof the CCC-GARCH model

Let (ε1, . . . , εn) be an observation of length n of the unique nonanticipative and strictlystationary solution (εt ) of model (11.41). Conditionally on nonnegative initial valuesε0, . . . , ε1−q, h0, . . . , h1−p , the Gaussian quasi-likelihood is written as

Ln(θ) = Ln(θ; ε1, . . . , εn) =n∏t=1

1

(2π)m/2|Ht |1/2exp

(−1

2ε′t H

−1t εt

),

where the Ht are recursively defined, for t ≥ 1, by{Ht = DtRDt , Dt = {diag(ht )}1/2

ht = ht (θ) = ω +∑q

i=1 Ai εt−i +∑p

j=1 Bj ht−j .

A QMLE of θ is defined as any measurable solution θn such that

θn = arg maxθ∈�

Ln(θ) = arg minθ∈�

ln(θ), (11.47)

where

ln(θ) = n−1n∑t=1

t , t = t (θ) = ε′t H−1t εt + log |Ht |.

Remark 11.6 (Choice of initial values) It will be shown later that, as in the univariate case, theinitial values have no influence on the asymptotic properties of the estimator. These initial valuescan be fixed, for instance, so that

ε0 = · · · = ε1−q = h1 = · · · = h1−p = 0.

They can also be chosen as functions of θ , such as

ε0 = · · · = ε1−q = h1 = · · · = h1−p = ω,

Page 311: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 295

or as random variable functions of the observations, such as

ht = εt =

ε21t...

ε2mt

, t = 0,−1, . . . , 1− r,

where the first r = max{p, q} observations are denoted by ε1−r , . . . , ε0.

Let γ (A0) denote the top Lyapunov coefficient of the sequence of matrices A0 = (A0t ) defined asin (11.36), at θ = θ0. The following assumptions will be used to establish the strong consistencyof the QMLE.

A1: θ0 ∈ � and � is compact.

A2: γ (A0) < 0 and, for all θ ∈ �, detB(z) = 0 ⇒ |z|> 1.

A3: The components of ηt are independent and their squares have nondegenerate distributions.

A4: If p> 0, then Aθ0 (z) and Bθ0 (z) are left coprime and M1(Aθ0 ,Bθ0 ) has full rank m.

A5: R is a positive definite correlation matrix for all θ ∈ �.If the space � is constrained by (11.45), that is, if maximal orders are imposed for each componentof εt and ht in each equation, then assumption A4 can be replaced by the following more generalcondition:

A4′: If p> 0, then Aθ0 (z) and Bθ0 (z) are left coprime and M(Aθ0 ,Bθ0) has full rank m.

It will be useful to approximate the sequence ( t (θ)) by an ergodic and stationary sequence.Assumption A2 implies that, for all θ ∈ �, the roots of Bθ (z) are outside the unit disk. Denote by(ht)t= {ht (θ)}t the strictly stationary, nonanticipative and ergodic solution of

ht = ω +q∑i=1

Aiεt−i +p∑j=1

Bj ht−j , ∀t. (11.48)

Now, letting Dt = {diag(ht )}1/2 and Ht = DtRDt, we define

ln(θ) = ln(θ; εn, εn−1 . . . , ) = n−1n∑t=1

t , t = t (θ) = ε′tH−1t εt + log |Ht |.

We are now in a position to state the following consistency theorem.

Theorem 11.7 (Strong consistency) Let (θn) be a sequence of QMLEs satisfying (11.47). Then,under A1−A5 (or A1−A3,A4′ and A5),

θn → θ0, almost surely when n→∞.

To establish the asymptotic normality we require the following additional assumptions:

A6: θ0 ∈◦�, where

◦� is the interior of �.

A7: E‖ηtη′t‖2 <∞.

Page 312: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

296 GARCH MODELS

Theorem 11.8 (Asymptotic normality) Under the assumptions of Theorem 11.7 and A6−A7,√n(θn − θ0) converges in distribution to N(0, J−1IJ−1), where J is a positive definite matrix and

I is a positive semi-definite matrix, defined by

I = E

(∂ t (θ0)

∂θ

∂ t (θ0)

∂θ ′

), J = E

(∂2 t (θ0)

∂θ∂θ ′

).

11.4.3 Proof of the Consistency and the Asymptotic Normalityof the QML

We shall use the multiplicative norm (see Exercises 11.5 and 11.6) defined by

‖A‖ := sup‖x‖≤1

‖Ax‖ = ρ1/2(A′A), (11.49)

where A is a d1 × d2 matrix, ‖x‖ is the Euclidean norm of vector x ∈ Rd2 , and ρ(·) denotes thespectral radius. This norm satisfies, for any d2 × d1 matrix B,

‖A‖2 ≤∑i,j

a2i,j = Tr(A′A) ≤ d2‖A‖2, |A′A| ≤ ‖A‖2d2 , (11.50)

|Tr(AB)| ≤∑

i,j

a2i,j

1/2∑i,j

b2i,j

1/2

≤ {d2d1}1/2‖A‖‖B‖. (11.51)

Proof of Theorem 11.7

The proof is similar to that of Theorem 7.1 for the univariate case.Rewrite (11.48) in matrix form as

Ht = ct + BHt−1, (11.52)

where B is defined in Corollary 11.1 and

Ht =

htht−1...

ht−p+1

, ct =

ω +

q∑i=1

Aiεt−i

0...

0

. (11.53)

We shall establish the intermediate results (a), (c) and (d) which are stated as in the univariatecase (see Section 7.4 of Chapter 7), result (b) being replaced by

(b)′{ht (θ) = ht (θ0) Pθ0 a.s. and R(θ) = R(θ0)

} �⇒ θ = θ0.

Proof of (a): initial values are asymptotically irrelevant. In view of assumption A2 andCorollary 11.1, we have ρ(B) < 1. By the compactness of �, we even have

supθ∈�

ρ(B) < 1. (11.54)

Iteratively using equation (11.52), as in the univariate case, we deduce that almost surely

supθ∈�

‖Ht − Ht‖ ≤ Kρt, ∀t, (11.55)

where Ht denotes the vector obtained by replacing the variables ht−i by ht−i in Ht . Observe thatK is a random variable that depends on the past values {εt , t ≤ 0}. Since K does not depend onn, it can be considered as a constant, such as ρ. From (11.55) we deduce that, almost surely,

Page 313: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 297

supθ∈�

‖Ht − Ht‖ ≤ Kρt , ∀t. (11.56)

Noting that ‖R−1‖ is the inverse of the eigenvalue of smallest modulus of R, and that ‖D−1t ‖ =

{mini (hii,t )}−1, we have

supθ∈�

‖H−1t ‖ ≤ sup

θ∈�‖D−1

t ‖2‖R−1‖ ≤ supθ∈�{min

iω(i)}−2‖R−1‖ ≤ K, (11.57)

using A5, the compactness of � and the strict positivity of the components of ω. Similarly,we have

supθ∈�

‖H−1t ‖ ≤ K. (11.58)

Nowsupθ∈�

|ln(θ)− ln(θ)|

≤ n−1n∑t=1

supθ∈�

∣∣ε′t (H−1t − H−1

t )εt∣∣+ n−1

n∑t=1

supθ∈�

∣∣log |Ht | − log |Ht |∣∣ . (11.59)

The first sum can be written as

n−1n∑t=1

supθ∈�

∣∣ε′t H−1t (Ht − Ht )H

−1t εt

∣∣= n−1

n∑t=1

supθ∈�

∣∣ Tr {ε′t H−1t (Ht − Ht )H

−1t εt }

∣∣= n−1

n∑t=1

supθ∈�

∣∣ Tr {H−1t (Ht − Ht )H

−1t εt ε

′t }∣∣

≤ Kn−1n∑t=1

supθ∈�

‖H−1t ‖‖Ht − Ht‖‖H−1

t ‖‖εt ε′t‖

≤ Kn−1n∑t=1

ρt‖εt ε′t‖

→ 0

as n→∞, using (11.51), (11.56), (11.57), (11.58), the Cesaro lemma and the fact thatρt‖εt ε′t‖ = ρtε′t εt → 0 a.s.5 Now, using (11.50), the triangle inequality and, for x ≥ −1,log(1+ x) ≤ x, we have

log |Ht | − log |Ht | = log |Im + (Ht − Ht )H−1t |

≤ m log ‖Im + (Ht − Ht )H−1t ‖

≤ m log(‖Im‖ + ‖(Ht − Ht )H−1t ‖)

≤ m log(1+ ‖(Ht − Ht )H−1t ‖)

≤ m‖Ht − Ht‖‖H−1t ‖,

5 The latter statement can be shown by using the Borel–Cantelli lemma, the Markov inequality and byapplying Corollary 11.2:

∞∑t=1

P(ρt ε′t εt > ε) ≤∞∑t=1

ρstE(ε ′t εt )s

εs=

∞∑t=1

ρstE‖εt‖2s

εs<∞.

Page 314: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

298 GARCH MODELS

and, by symmetry,log |Ht | − log |Ht | ≤ m‖Ht − Ht‖‖H−1

t ‖.Using (11.56), (11.57) and (11.58) again, we deduce that the second sum in (11.59) tends to 0.We have thus shown that almost surely as n→∞,

supθ∈�

∣∣ln(θ)− ln(θ)∣∣→ 0.

Proof of (b)′: identifiability of the parameter. Suppose that, for some θ = θ0,

ht (θ) = ht (θ0), Pθ0 -a.s. and R(θ) = R(θ0).

Then it readily follows that ρ = ρ0 and, using the invertibility of the polynomial Bθ (B) underassumption A2, by (11.42),

Bθ (1)−1ω + Bθ (B)−1Aθ (B)εt = Bθ0(1)−1ω0 + Bθ0 (B)

−1Aθ0 (B)εt

that is,

Bθ (1)−1ω − Bθ0(1)−1ω0 = {Bθ0 (B)

−1Aθ0(B)− Bθ (B)−1Aθ (B)}εt

:= P(B)εt a.s. ∀t.Let P(B) =∑∞

i=0 PiBi . Noting that P0 = P(0) = 0 and isolating the terms that are functions

of ηt−1,P1(h11,t−1η

21,t−1, . . . , hmm,t−1η

2m,t−1)

′ = Zt−2, a.s.,

where Zt−2 belongs to the σ -field generated by {ηt−2, ηt−3, . . .}. Since ηt−1 is independent of thisσ -field, Exercise 11.3 shows that the latter equality contradicts A3 unless, for i, j = 1, . . . , m,pijhjj,t = 0 almost surely, where the pij are the entries of P1. Because hjj,t > 0 for all j , wethus have P1 = 0. Similarly, we show that P(B) = 0 by successively considering the past valuesof ηt−1. Therefore, in view of A4 (or A4′), we have α = α0 and β = β0 (see Section 11.4.1). Itreadily follows that ω = ω0. Hence θ = θ0. We have thus established (b)′.

Proof of (c): the limit criterion is minimized at the true value. As in the univariate case, wefirst show that Eθ0 t (θ) is well defined on R ∪ {+∞} for all θ , and on R for θ = θ0. We have

Eθ0 −t (θ) ≤ Eθ0 log− |Ht | ≤ max{0,− log(|R|min

iω(i)m)} <∞.

At θ0, Jensen’s inequality, the second inequality in (11.50) and Corollary 11.2 entail that

Eθ0 log |Ht(θ0)| = Eθ0

m

slog |Ht(θ0)|s/m

≤ m

slogEθ0‖Ht(θ0)‖s ≤ m

slogEθ0‖R‖s‖Dt(θ0)‖2s

≤ K + m

slogEθ0‖Dt(θ0)‖2s

= K + m

slogEθ0 (max

ihii,t (θ0))

s

≤ K + m

slogEθ0

{∑i

h2ii,t (θ0)

}s/2

= K + m

slogEθ0‖ht (θ0)‖s <∞.

Page 315: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 299

It follows that

Eθ0 t (θ0) = Eθ0

{η′tHt (θ0)

1/2′Ht(θ0)−1Ht(θ0)

1/2ηt + log |Ht(θ0)|}

= m+ Eθ0 log |Ht(θ0)| <∞.

Because Eθ0 −t (θ0) <∞, the existence of Eθ0 t (θ0) in R holds. It is thus not restrictive to study

the minimum of Eθ0 t (θ) for the values of θ such that Eθ0 | t (θ)| <∞. Denoting by λi,t , thepositive eigenvalues of Ht(θ0)H

−1t (θ) (see Exercise 11.15), we have

Eθ0 t (θ)− Eθ0 t (θ0)

= Eθ0 log|Ht(θ)||Ht(θ0)| + Eθ0

{η′t [H

1/2t (θ0)

′H−1t (θ)H

1/2t (θ0)− Im]ηt

}= Eθ0 log{|Ht(θ)H

−1t (θ0)|}

+ Tr(Eθ0

{[H 1/2

t (θ0)′H−1

t (θ)H1/2t (θ0)− Im]

}E(ηtη

′t ))

= Eθ0 log{|Ht(θ)H−1t (θ0)|} + Eθ0

(Tr{[Ht(θ0)H

−1t (θ)− Im]

})= Eθ0

{m∑i=1

(λit − 1− logλit )

}≥ 0

because log x ≤ x − 1 for all x > 0. Since log x = x − 1 if and only if x = 1, the inequality isstrict unless if, for all i, λit = 1 Pθ0 -a.s., that is, if Ht(θ) = Ht(θ0), Pθ0 -a.s. (by Exercise 11.15).This equality is equivalent to

ht (θ) = ht (θ0), Pθ0 -a.s., R(θ) = R(θ0)

and thus to θ = θ0, from (b)′.

Proof of (d). The last part of the proof of the consistency uses the compactness of � and theergodicity of ( t (θ)), as in the univariate case.

Theorem 11.7 is thus established. �

Proof of Theorem 11.8We start by stating a few elementary results on the differentiation of expressions involving matrices.If f (A) is a real-valued function of a matrix A whose entries aij are functions of some variablex, the chain rule for differentiation of compositions of functions states that

∂f (A)

∂x=∑i,j

∂f (A)

∂aij

∂aij

∂x= Tr

{∂f (A)

∂A′∂A

∂x

}. (11.60)

Moreover, for A invertible we have

∂c′Ac∂A′

= cc′, (11.61)

∂Tr(CA′BA′)∂A′

= C′AB ′ + B ′AC ′, (11.62)

Page 316: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

300 GARCH MODELS

∂ log |det(A)|∂A′

= A−1, (11.63)

∂A−1

∂x= −A−1 ∂A

∂xA−1, (11.64)

∂Tr(CA−1B)

∂A′= −A−1BCA−1, (11.65)

∂Tr(CAB)

∂A′= BC. (11.66)

(a) First derivative of the criterion. Applying (11.60) and (11.61), then (11.62), (11.63) and(11.64), we obtain

∂ t (θ)

∂θi= Tr

(εt ε

′t

∂D−1t R−1D−1

t

∂θi

)+ 2

∂ log |detDt |∂θi

= −Tr

{(εt ε

′tD

−1t R−1 + R−1D−1

t εt ε′t

)D−1t

∂Dt

∂θiD−1t

}+2Tr

(D−1t

∂Dt

∂θi

), (11.67)

for i = 1, . . . , s1 = m+ (p + q)m2, and using (11.65),

∂ t (θ)

∂θi= −Tr

(R−1D−1

t εt ε′tD

−1t R−1 ∂R

∂θi

)+ Tr

(R−1 ∂R

∂θi

), (11.68)

for i = s1 + 1, . . . , s0. Letting D0t = Dt(θ0), R0 = R(θ0),

D(i)0t =

∂Dt

∂θi(θ0), R

(i)0 = ∂R

∂θi(θ0), D

(i,j)

0t = ∂2Dt

∂θi∂θj(θ0), R

(i,j)

0 = ∂2R

∂θi∂θj(θ0),

and ηt = R1/2ηt , the score vector is written as

∂ t (θ0)

∂θi= Tr

{(Im − R−1

0 ηt η′t

)D(i)0t D

−10t +

(Im − ηt η

′tR−10

)D−1

0t D(i)0t

}, (11.69)

for i = 1, . . . , s1, and

∂ t (θ0)

∂θi= Tr

{(Im − R−1

0 ηt η′t

)R−1

0 R(i)0

}, (11.70)

for i = s1 + 1, . . . , s0.

(b) Existence of moments of any order for the score. In view of (11.51) and theCauchy–Schwarz inequality, we obtain

E

∣∣∣∣∂ t (θ0)

∂θi

∂ t (θ0)

∂θj

∣∣∣∣ ≤ K

{E

∥∥∥D−10t D

(i)0t

∥∥∥2E

∥∥∥D−10t D

(j)

0t

∥∥∥2}1/2

,

for i, j = 1, . . . , s1,

E

∣∣∣∣∂ t (θ0)

∂θi

∂ t (θ0)

∂θj

∣∣∣∣ < KE

∥∥∥D−10t D

(i)0t

∥∥∥ ,

Page 317: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 301

for i = 1, . . . , s1 and j = s1 + 1, . . . , s0, and

E

∣∣∣∣∂ t (θ0)

∂θi

∂ t (θ0)

∂θj

∣∣∣∣ < K,

for i, j = s1 + 1, . . . , s0. Note also that

D(i)0t =

1

2D−1

0t diag

{∂ht

∂θi(θ0)

}.

To show that the score admits a second-order moment, it is thus sufficient to prove that

E

∣∣∣∣ 1

ht (i1)

∂ht (i1)

∂θi(θ0)

∣∣∣∣r0 <∞for all i1 = 1, . . . , m, all i = 1, . . . , s1 and r0 = 2. By (11.52) and (11.54),

supθ∈�

∥∥∥∥∂Ht

∂θi

∥∥∥∥ <∞, i = 1, . . . , m,

and, setting s2 = m+ qm2,

θi∂Ht

∂θi≤ Ht , i = m+ 1, . . . , s2.

On the other hand, we have

∂Ht

∂θi=

∞∑k=1

k∑

j=1

Bj−1B(i)Bk−j

ct−k, i = s2 + 1, . . . , s1,

where B(i) = ∂B/∂θi is a matrix whose entries are all 0, apart from a 1 located at the same placeas θi in B. In an abuse of notation, we denote by Ht (i1) and h0t (i1) the i1th components of Ht

and ht (θ0). With arguments similar to those used in the univariate case, that is, the inequalityx/(1+ x) ≤ xs for all x ≥ 0 and s ∈ [0, 1], and the inequalities

θi∂Ht

∂θi≤

∞∑k=1

kBkct−k, θi∂Ht (i1)

∂θi≤

∞∑k=1

k

m∑j1=1

Bk(i1, j1)ct−k(j1)

and, setting ω = inf1≤i≤m ω(i),

Ht (i1) ≥ ω +m∑

j1=1

Bk(i1, j1)ct−k(j1), ∀k,

we obtain

θi

Ht (i1)

∂Ht (i1)

∂θi≤

m∑j1=1

∞∑k=1

k

{Bk(i1, j1)ct−k(j1)

ω

}s/r0≤ K

m∑j1=1

∞∑k=1

kρkj1cs/r0t−k (j1),

where the constants ρj1 (which also depend on i1, s and r0) belong to the interval [0, 1). Noting

that these inequalities are uniform on a neighborhood of θ0 ∈◦�, that they can be extended to

higher-order derivatives, as in the univariate case, and that Corollary 11.2 implies that ‖ct‖s <∞,

Page 318: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

302 GARCH MODELS

we can show a stronger result than the one stated: for all i1 = 1, . . . , m, all i, j, k = 1, . . . , s1 andall r0 ≥ 0, there exists a neighborhood V(θ0) of θ0 such that

E supθ∈V(θ0)

∣∣∣∣ 1

ht (i1)

∂ht (i1)

∂θi(θ)

∣∣∣∣r0 <∞, (11.71)

E supθ∈V(θ0)

∣∣∣∣ 1

ht (i1)

∂2ht (i1)

∂θi∂θj(θ)

∣∣∣∣r0 <∞ (11.72)

and

E supθ∈V(θ0)

∣∣∣∣ 1

ht (i1)

∂3ht (i1)

∂θi∂θj ∂θk(θ)

∣∣∣∣r0 <∞. (11.73)

(c) Asymptotic normality of the score vector. Clearly, {∂ t (θ0)/∂θ}t is stationary and∂ t (θ0)/∂θ is measurable with respect to the σ -field Ft generated by {ηu, u ≤ t}. From (11.69)and (11.70), we have E {∂ t (θ0)/∂θ | Ft−1} = 0. Property (b), and in particular (11.71), ensuresthe existence of the matrix

I := E∂ t (θ0)

∂θ

∂ t (θ0)

∂θ ′.

It follows that, for all λ ∈ Rp+q+1, the sequence{λ′ ∂

∂θ t (θ0),Ft

}t

is an ergodic, stationary andsquare integrable martingale difference. Corollary A.1 entails that

n−1/2n∑t=1

∂θ t (θ0)

L→ N (0, I ) .

(d) Higher-order derivatives of the criterion. Starting from (a) and applying (11.60) and (11.65)several times, as well as (11.66), we obtain

∂ 2t (θ)

∂θi∂θj= Tr (c1 + c2 + c3) , i, j = 1, . . . , s1,

where

c1 = D−1t R−1D−1

t

∂Dt

∂θiD−1t εt ε

′tD

−1t

∂Dt

∂θj+D−1

t

∂Dt

∂θiD−1t εt ε

′tD

−1t R−1D−1

t

∂Dt

∂θj

+D−1t εt ε

′tD

−1t R−1D−1

t

∂Dt

∂θiD−1t

∂Dt

∂θj−D−1

t εt ε′tD

−1t R−1D−1

t

∂2Dt

∂θi∂θj,

c2 = −2D−1t

∂Dt

∂θiD−1t

∂Dt

∂θj+ 2D−1

t

∂2Dt

∂θi∂θj,

and c3 is obtained by permuting εtε′t and R−1 in c1. We also obtain

∂ 2t (θ)

∂θi∂θj= Tr (c4 + c5) , i = 1, . . . , s1, j = s1 + 1, . . . , s0,

∂ 2t (θ)

∂θi∂θj= Tr (c6) , i, j = s1 + 1, . . . , s0,

Page 319: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 303

where

c4 = R−1D−1t

∂Dt

∂θiD−1t εt ε

′tD

−1t R−1 ∂R

∂θj,

c6 = R−1D−1t εt ε

′tD

−1t R−1 ∂R

∂θiR−1 ∂R

∂θj+ R−1 ∂R

∂θiR−1D−1

t εt ε′tD

−1t R−1 ∂R

∂θj

−R−1D−1t εt ε

′tD

−1t R−1 ∂2R

∂θi∂θj− R−1 ∂2R

∂θi∂θj− R−1 ∂R

∂θiR−1 ∂R

∂θj,

and c5 is obtained by permuting εt ε′t and ∂Dt/∂θi in c4. Results (11.71) and (11.72) ensure the

existence of the matrix J := E∂2 t (θ0)/∂θ∂θ′, which is invertible, as shown in (e) below. Note

that with our parameterization, ∂2R/∂θi∂θj = 0.Continuing the differentiations, it can be seen that ∂ 3

t (θ)/∂θi∂θj ∂θk is also the trace of a sumof products of matrices similar to the ci . The integrable matrix εt ε′t appears at most once in each ofthese products. The other terms are, on the one hand, the bounded matrices R−1, ∂R/∂θi and D−1

t

and, on the other hand, the matrices D−1t ∂Dt/∂θi , D

−1t ∂2Dt/∂θi∂θj and D−1

t ∂3Dt/∂θi∂θj ∂θk .From (11.71)–(11.73), the norms of the latter three matrices admit moments at any orders in theneighborhood of θ0. This shows that

E supθ∈V(θ0)

∣∣∣∣ ∂3 t (θ)

∂θi∂θj ∂θk

∣∣∣∣ <∞.

(e) Invertibility of the matrix J . The expression for J obtained in (d), as a function of thepartial derivatives of Dt and R, is not in a convenient form for showing its invertibility. We startby writing J as a function of Ht and of its derivatives. Starting from

t (θ) = ε′tH−1t εt + log |Ht |,

the differentiation formulas (11.60), (11.63) and (11.65) give

∂ t

∂θi= Tr

{(H−1t −H−1

t εt ε′tH

−1t

) ∂Ht

∂θi

},

and then, using (11.64) and (11.66),

∂2 t

∂θi∂θj= Tr

(H−1t

∂2Ht

∂θi∂θj

)− Tr

(H−1t

∂Ht

∂θjH−1t

∂Ht

∂θi

)+Tr

(H−1t εt ε

′tH

−1t

∂Ht

∂θiH−1t

∂Ht

∂θj

)+ Tr

(H−1t

∂Ht

∂θiH−1t εt ε

′tH

−1t

∂Ht

∂θj

)−Tr

(H−1t εt ε

′tH

−1t

∂2Ht

∂θi∂θj

).

From the relation Tr(A′B) = (vecA)′vecB, we deduce that

E

(∂2 t (θ0)

∂θi∂θj| Ft−1

)= Tr

(H−1

0t H(i)0t H

−10t H

(j)

0t

)= h′ihj ,

where, using vec(ABC) = (C ′ ⊗ A)vecB,

hi = vec(H−1/20t H

(i)0t H

−1/20t

)=(H−1/20t ⊗H

−1/20t

)di , di = vec

(H

(i)0t

).

Page 320: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

304 GARCH MODELS

Introducing the m2 × s0 matrices

h = (h1 | · · · | hs0) and d = (d1 | · · · | ds0 ),

we have h = Hd with H = H−1/20t ⊗H

−1/20t . Now suppose that J = Eh′h is singular. Then, there

exists a nonzero vector c ∈ Rs0 , such that c′J c = Ec′h′hc = 0. Since c′h′hc ≥ 0 almost surely,we have

c′h′hc = c′d′H2dc = 0 a.s. (11.74)

Because H2 is a positive definite matrix, with probability 1, this entails that dc = 0m2 withprobability 1. Decompose c into c = (c′1, c′2)

′ with c1 ∈ Rs1 and c2 ∈ Rs3 , where s3 = s0 − s1 =m(m− 1)/2. Rows 1, m+ 1, . . . , m2 of the equations

dc =s0∑i=1

ci∂

∂θivecH0t =

s0∑i=1

ci∂

∂θi(D0t ⊗D0t ) vecR0 = 0m2 , a.s., (11.75)

give

s1∑i=1

ci∂

∂θiht (θ0) = 0m, a.s. (11.76)

Differentiating equation (11.48) yields

s1∑i=1

ci∂

∂θiht = ω∗ +

q∑j=1

A∗j εt−j +p∑j=1

B∗j ht−j +p∑

j=1

Bj

s1∑i=1

ci∂

∂θiht−j ,

where

ω∗ =s1∑i=1

ci∂

∂θiω, A∗j =

s1∑i=1

ci∂

∂θiAj , B∗j =

s1∑i=1

ci∂

∂θiBj .

Because (11.76) is satisfied for all t , we have

ω∗0 +q∑

j=1

A∗0j εt−j +p∑j=1

B∗0j ht−j (θ0) = 0,

where quantities evaluated at θ = θ0 are indexed by 0. This entails that

ht (θ0) = ω0 − ω∗0 +q∑

j=1

(A0j − A∗0j

)εt−j +

p∑j=1

(B0j − B∗0j

)ht−j (θ0),

and finally, introducing a vector θ1 whose s1 first components are vec(ω0 − ω∗0 | A01 − A∗01 |

· · · | B0p − B∗0p)

,ht (θ0) = ht (θ1)

by choosing c1 small enough so that θ1 ∈ �. If c1 = 0 then θ1 = θ0. This is in contradiction to theidentifiability of the parameter, hence c1 = 0. Equations (11.75) thus become

(D0t ⊗D0t )

s0∑i=s1+1

ci∂

∂θivecR0 = 0m2 , a.s.

Page 321: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 305

Therefore,s0∑

i=s1+1

ci∂

∂θivecR0 = 0m2 .

Because the vectors ∂vecR/∂θi , i = s1 + 1, . . . , s0, are linearly independent, the vector c2 =(cs1+1, . . . , cs0

)′is nul, and thus c = 0. This contradicts (11.74), and shows that the assumption

that J is singular is absurd.

(f) Asymptotic irrelevance of the initial values. First remark that (11.55) and the argumentsused to show (11.57) and (11.58) entail that

supθ∈�

‖Dt − Dt‖ ≤ Kρt, supθ∈�

‖D−1t ‖ ≤ K, sup

θ∈�‖D−1

t ‖ ≤ K, (11.77)

and thus

supθ∈�

‖D1/2t − D

1/2t ‖ ≤ Kρt, sup

θ∈�‖D−1/2

t ‖ ≤ K, supθ∈�

‖D−1/2t ‖ ≤ K,

supθ∈�

‖D1/2t D

−1/2t ‖ ≤ K(1+ ρt ), sup

θ∈�‖D1/2

t D−1/2t ‖ ≤ K(1+ ρt ). (11.78)

From (11.52), we have

Ht =t−r−1∑k=0

Bkct−k + Bt−rHr , Ht =t−r−1∑k=0

Bkct−k + Bt−rHr ,

where r = max{p, q} and the tilde means that initial values are taken into account. Since ct = ctfor all t > r , we have Ht − Ht = Bt−r (Hr − Hr

)and

∂θi

(Ht − Ht

) = Bt−r ∂

∂θi

(Hr − Hr

)+ t−r∑j=1

Bj−1B(i)Bt−r−j (Hr − Hr

).

Thus (11.54) entails that

supθ∈�

∥∥∥∥ ∂

∂θi

(Dt − Dt

)∥∥∥∥ ≤ Kρt . (11.79)

BecauseD−1t − D−1

t = D−1t

(Dt −Dt

)D−1t ,

we thus have (11.77), implying that

supθ∈�

∥∥(D−1t − D−1

t

)∥∥ ≤ Kρt , supθ∈�

∥∥∥(D−1/2t − D

−1/2t

)∥∥∥ ≤ Kρt . (11.80)

Denoting by h0t (i1) the i1th component of ht (θ0),

h0t (i1) = c0 +∞∑k=0

m∑j1=1

m∑j2=1

q∑i=1

A0i (j1, j2)Bk0(i1, j1)ε

2j2,t−k−i ,

where c0 is a strictly positive constant and, by the usual convention, the index 0 corresponding toquantities evaluated at θ = θ0. For a sufficiently small neighborhood V(θ0) of θ0, we have

supθ∈V(θ0)

A0i (j1, j2)

Ai (j1, j2)< K, sup

θ∈V(θ0)

Bk0(i1, j1)

Bk(i1, j1)< 1+ δ,

Page 322: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

306 GARCH MODELS

for all i1, j1, j2 ∈ {1, . . . , m} and all δ > 0. Moreover, in ht (i1), the coefficient of Bk(i1, j1)ε2j2,t−k−i

is bounded below by a constant c > 0 uniformly on θ ∈ V(θ0). We thus have

h0t (i1)

ht (i1)≤ K +K

∞∑k=0

m∑j1=1

m∑j2=1

q∑i=1

(1+ δ)kBk(i1, j1)ε2j2,t−k−i

ω + cBk(i1, j1)ε2j2,t−k−i

≤ K +K

m∑j2=1

q∑i=1

∞∑k=0

(1+ δ)kρksε2sj2,t−k−i ,

for some ρ ∈ [0, 1), all δ > 0 and all s ∈ [0, 1]. Corollary 11.2 then implies that, for all r0 ≥ 0,

E supθ∈V(θ0)

∣∣∣∣h0t (i1)

ht (i1)

∣∣∣∣r0 <∞.

From this we deduce that

E supθ∈V(θ0)

‖D−1/2t εt‖2 = E sup

θ∈V(θ0)

‖D−1/2t D

1/20t ηt‖2 <∞, (11.81)

supθ∈V(θ0)

‖D−1/2t εt‖ ≤ (1+Kρt) sup

θ∈V(θ0)

‖D−1/2t εt‖. (11.82)

The last inequality follows from (11.77) because

D−1/2t εt = D

−1/2t

(D

1/2t −D

1/2t

)D−1/2t εt −D

−1/2t εt .

By (11.67) and (11.68),∂ t (θ)

∂θi− ∂ t (θ)

∂θi= Tr(c1 + c2 + c3),

where

c1 = −D−1/2t εt ε

′t D

−1t R−1 (D−1

t − D−1t

)D

1/2t D

−1/2t

∂Dt

∂θiD−1/2t ,

c2 = −D−1/2t εt ε

′t D

−1t R−1D−1

t

(∂Dt

∂θi− ∂Dt

∂θi

)D−1/2t ,

and c3 contains terms which can be handled as c1 and c2. Using (11.77)–(11.82), theCauchy–Schwarz inequality, and

E supθ∈V(θ0)

∥∥∥∥D−1/2t

∂Dt

∂θiD−1/2t

∥∥∥∥2

<∞,

which follows from (11.71), we obtain

supθ∈V(θ0)

∣∣∣∣∂ t (θ)∂θi− ∂ t (θ)

∂θi

∣∣∣∣ ≤ Kρtut ,

where ut is an integrable variable. From the Markov inequality, n−1/2∑nt=1 ρ

tut = oP (1), whichimplies that ∥∥∥∥∥n−1/2

n∑t=1

{∂ t (θ0)

∂θ− ∂ t (θ0)

∂θ

}∥∥∥∥∥ = oP (1).

Page 323: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 307

We have in fact shown that this convergence is uniform on a neighborhood of θ0, but this isof no direct use for what follows. By exactly the same arguments,

supθ∈V(θ0)

∣∣∣∣∂2 t (θ)

∂θi∂θj− ∂2 t (θ)

∂θi∂θj

∣∣∣∣ ≤ Kρtu∗t ,

where u∗t is an integrable random variable, which entails that

n−1/2n∑t=1

supθ∈V(θ0)

∥∥∥∥∂2 t (θ)

∂θ∂θ ′− ∂ 2

t (θ)

∂θ∂θ ′

∥∥∥∥ = OP (n−1) = oP (1).

It now suffices to observe that the analogs of steps (a)–(f) in Section 7.4 have been verified,and we are done. �

11.5 Bibliographical Notes

Multivariate ARCH models were first considered by Engle, Granger and Kraft (1984), in theguise of the diagonal model. This model was extended and studied by Bollerslev, Engle andWoolridge (1988). The reader may refer to Hafner and Preminger (2009a), Lanne and Saikkonen(2007), van der Weide (2002) and Vrontos, Dellaportas and Politis (2003) for the definition andstudy of FF-GARCH models of the form (11.26) where P is not assumed to be orthonormal.The CCC-GARCH model based on (11.17) was introduced by Bollerslev (1990) and extendedto (11.18) by Jeantheau (1998). A sufficient condition for strict stationarity and the existence offourth-order moments of the CCC-GARCH(p, q) is established by Aue et al. (2009). The DCCformulations based on (11.19) and (11.20) were proposed, respectively, by Tse and Tsui (2002),and Engle (2002a). The single-factor model (11.24), which can be viewed as a dynamic versionof the capital asset pricing model of Sharpe (1964), was proposed by Engle, Ng and Rothschild(1990). The main references on the O-GARCH or PC-GARCH, models are Alexander (2002) andDing and Engle (2001). See van der Weide (2002) and Boswijk and van der Weide (2006) forreferences on the GO-GARCH model. Hafner (2003) and He and Terasvirta (2004) studied thefourth-order moments of multivariate GARCH models. Dynamic conditional correlations modelswere introduced by Engle (2002a) and Tse and Tsui (2002). These references, and those given inthe text, can be complemented by the recent surveys by Bauwens, Laurent and Rombouts (2006)and Silvennoinen and Terasvirta (2008), and by the book by Engle (2009).

Jeantheau (1998) gave general conditions for the strong consistency of the QMLE for mul-tivariate GARCH models. Comte and Lieberman (2003) showed the consistency and asymptoticnormality of the QMLE for the BEKK formulation. Asymptotic results were established by Lingand McAleer (2003a) for the CCC formulation of an ARMA-GARCH, and by Hafner and Pre-minger (2009a) for a factor GARCH model of the FF-GARCH form. Theorems 11.7 and 11.8 areconcerned with the CCC formulation, and allow us to study a subclass of the models consideredby Ling and McAleer (2003a), but do not cover the models studied by Comte and Lieberman(2003) or those studied by Hafner and Preminger (2009b). Theorems 11.7-11.8 are mainly ofinterest because that they do not require any moment on the observed process and do not usehigh-level assumptions. For additional information on identifiability, in particular on the echelonform, one may for instance refer to Hannan (1976), Hannan and Deistler (1988), Lutkepohl (1991)and Reinsel (1997).

Portmanteau tests on the normalized residuals of multivariate GARCH processes were pro-posed, in particular, by Tse (2002), Duchesne and Lalancette (2003).

Bardet and Wintenberger (2009) established the strong consistency and asymptotic normalityof the QMLE for a general class of multidimensional causal processes.

Page 324: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

308 GARCH MODELS

Among models not studied in this book are the spline GARCH models in which the volatility iswritten as a product of a slowly varying deterministic component and a GARCH-type component.These models were introduced by Engle and Rangel (2008), and their multivariate generalizationis due to Hafner and Linton (2010).

11.6 Exercises

11.1 (More or less parsimonious representations)Compare the number of parameters of the various GARCH(p, q) representations, as a func-tion of the dimension m.

11.2 (Identifiability of a matrix rational fraction)Let Aθ (z), Bθ (z), Aθ0 (z) and Bθ0(z) denote square matrices of polynomials. Show that

Bθ (z)−1Aθ (z) = Bθ0(z)

−1Aθ0(z) (11.83)

for all z such that detBθ (z)Bθ0 (z) = 0 if and only if there exists an operator U(z) such that

Aθ (z) = U(z)Aθ0 (z) and Bθ (z) = U(z)Bθ0(z). (11.84)

11.3 (Two independent nondegenerate random variables cannot be equal)Let X and Y be two independent real random variables such that Y = X almost surely. Weaim to prove that X and Y are almost surely constant.

1. Suppose that Var(X) exists. Compute Var(X) and show the stated result in this case.

2. Suppose that X is discrete and P(X = x1)P (X = x2) = 0. Show that necessarily x1 = x2

and show the result in this case.

3. Prove the result in the general case.

11.4 (Duplication and elimination)Consider the duplication matrix Dm and the elimination matrix D+m defined by

vec(A) = Dmvech(A), vech(A) = D+mvec(A),

where A is any symmetric m×m matrix. Show that

D+mDm = Im(m+1)/2.

11.5 (Norm and spectral radius)Show that

‖A‖ := sup‖x‖≤1

‖Ax‖ = ρ1/2(A′A).

11.6 (Elementary results on matrix norms)Show the equalities and inequalities of (11.50)–(11.51).

11.7 (Scalar GARCH)The scalar GARCH model has a volatility of the form

Ht = �+q∑i=1

αiεt−iε′t−i +p∑j=1

βjHt−j ,

Page 325: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

MULTIVARIATE GARCH PROCESSES 309

where the αi and βj are positive numbers. Give the positivity and second-order stationarityconditions.

11.8 (Condition for the Lp and almost sure convergence)Let p ∈ [1,∞[ and let (un) be a sequence of real random variables of Lp such that

E |un − un−1|p ≤ Cρn,

for some positive constant C, and some constant ρ in ]0, 1[. Prove that

un converges almost surely and in Lp

to some random variable u of Lp.

11.9 (An average of correlation matrices is a correlation matrix)Let R and Q be two correlation matrices of the same size and let p ∈ [0, 1]. Show thatpR + (1− p)Q is a correlation matrix.

11.10 (Factors as linear combinations of individual returns)Consider the factor model

Var(εt ε

′t | εu, u < t

) = �+r∑

j=1

λjtβjβ′j ,

where the βj are linearly independent. Show there exist vectors αj such that

Var(εtε

′t | εu, u < t

) = �∗ +r∑

j=1

λ∗j tβjβ′j ,

where the λ∗j t are conditional variances of the portfolios α′j εt . Compute the conditionalcovariance between these factors.

11.11 (BEKK representation of factor models)Consider the factor model

Ht = �+r∑

j=1

λjtβjβ′j , λjt = ωj + aj ε

2j t−1 + bj λjt−1,

where the βj are linearly independent, ωj > 0, aj ≥ 0 and 0 ≤ bj < 1 for j = 1, . . . , r .Show that a BEKK representation holds, of the form

Ht = �∗ +K∑k=1

Akεt−1ε′t−1A

′k +

K∑k=1

BkHt−1B′k.

11.12 (PCA of a covariance matrix)Let X be a random vector of Rm with variance matrix .

1. Find the (or a) first principal component of X, that is a random variable C1 = u′1X ofmaximal variance, where u′1u1 = 1. Is C1 unique?

2. Find the second principal component, that is, a random variable C2 = u′2X of maximalvariance, where u′2u2 = 1 and Cov(C1, C2) = 0.

3. Find all the principal components.

Page 326: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

310 GARCH MODELS

11.13 (BEKK-GARCH models with a diagonal representation)Show that the matrices A(i) and B(j) defined in (11.21) are diagonal when the matrices Aik

and Bjk are diagonal.

11.14 (Determinant of a block companion matrix)If A and D are square matrices, with D invertible, we have

det

(A B

C D

)= det(D)det(A− BD−1C).

Use this property to show that matrix B in Corollary 11.1 satisfies

det(B − λImp) = (−1)mpdet{λpIm − λp−1B1 − · · · − λBp−1 − Bp

}.

11.15 (Eigenvalues of a product of positive definite matrices)Let A and B denote symmetric positive definite matrices of the same size. Show that ABis diagonalizable and that its eigenvalues are positive.

11.16 (Positive definiteness of a sum of positive semi-definite matrices)Consider two matrices of the same size, symmetric and positive semi-definite, of the form

A =(

A11 A12

A21 A22

)and B =

(B11 00 0

),

where A11 and B11 are also square matrices of the same size. Show that if A22 and B11 arepositive definite, then so is A+ B.

11.17 (Positive definite matrix and almost surely positive definite matrix)Let A by a symmetric random matrix such that for all real vectors c = 0,

c′Ac> 0 almost surely.

Show that this does not entail that A is almost surely positive definite.

Page 327: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

12

Financial Applications

In this chapter we discuss several financial applications of GARCH models. In connecting thesemodels with those frequently used in mathematical finance, one is faced with the problem that thelatter are generally written in continuous time. We start by studying the relation between GARCHand continuous-time processes. We present sufficient conditions for a sequence of stochastic dif-ference equations to converge in distribution to a stochastic differential equation as the lengthof the discrete time intervals between observations goes to zero. We then apply these results toGARCH(1, 1)-type models. The second part of this chapter is devoted to the pricing of deriva-tives. We introduce the notion of the stochastic discount factor and show how it can be used inthe GARCH framework. The final part of the chapter is devoted to risk measurement.

12.1 Relation between GARCH and Continuous-TimeModels

Continuous-time models are central to mathematical finance. Most theoretical results on derivativepricing rely on continuous-time processes, obtained as solutions of diffusion equations. However,discrete-time models are the most widely used in applications. The literature on discrete-timemodels and that on continuous-time models developed independently, but it is possible to establishconnections between the two approaches.

12.1.1 Some Properties of Stochastic Differential EquationsThis first section reviews basic material from diffusion processes, which will be known to manyreaders. On some probability space (�,A, P ), a d-dimensional process {Wt ; 0 ≤ t <∞} is calledstandard Brownian motion if W0 = 0 a.s., for s ≤ t , the increment Wt −Ws is independent ofσ {Wu;u ≤ s} and is N (0, (t − s)Id ) distributed, where Id is the d × d identity matrix. Brownianmotion is a Gaussian process and admits a version with continuous paths.

A stochastic differential equation (SDE) in Rp is an equation of the form

dXt = µ(Xt)dt + σ(Xt )dWt, 0 ≤ t <∞, X0 = x0, (12.1)

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 328: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

312 GARCH MODELS

where x0 ∈ Rp, µ and σ are measurable functions, defined on Rp and respectively taking valuesin Rp and Mp×d , the space of p × d matrices. Here we only consider time-homogeneous SDEs,in which the functions µ and σ do not depend on t . A process (Xt )t∈[0,T ] is a solution of thisequation, and is called a diffusion process , if it satisfies

Xt = x0 +∫ t

0µ(Xs)ds +

∫ t

0σ(Xs)dWs.

Existence and uniqueness of a solution require additional conditions on the functions µ and σ .The simplest conditions require Lipschitz and sublinearity properties:

‖µ(x)− µ(y)‖ + ‖σ(x)− σ(y)‖ ≤ K‖x − y‖,‖µ(x)‖2 + ‖σ(x)‖2 ≤ K2(1+ ‖x‖2),

where t ∈ [0,+∞[, x, y ∈ Rp, and K is a positive constant. In these inequalities, ‖ · ‖ denotesa norm on either Rp or Mp×d . These hypotheses also ensure the ‘nonexplosion’ of the solutionon every time interval of the form [0, T ] with T > 0 (see Karatzas and Schreve, 1988, Theorem5.2.9). They can be considerably weakened, in particular when p = d = 1. The term µ(Xt) iscalled the drift of the diffusion, and the term σ(Xt ) is called the volatility . They have the followinginterpretation:

µ(x) = limτ→0

τ−1E(Xt+τ −Xt | Xt = x), (12.2)

σ(x)σ (x)′ = limτ→0

τ−1Var(Xt+τ −Xt | Xt = x). (12.3)

These relations can be generalized using the second-order differential operator defined, in the casep = d = 1, by

L = µ∂

∂x+ 1

2σ 2 ∂2

∂x2.

Indeed, for a class of twice continuously differentiable functions f , we have

Lf (x) = limτ→0

τ−1{E(f (Xt+τ ) | Xt = x)− f (x)}.

Moreover, the following property holds: if φ is a twice continuously differentiable function withcompact support, then the process

Yt = φ(Xt )− φ(X0)−∫ t

0Lφ(Xs)ds

is a martingale with respect to the filtration (Ft ), where Ft is the σ -field generated by {Ws, s ≤ t}.This result admits a reciprocal which provides a useful characterization of diffusions. Indeed, itcan be shown that if, for a process (Xt), the process (Yt ) just defined is a Ft -martingale, for aclass of sufficiently smooth functions φ, then (Xt) is a diffusion and solves (12.1).

Stationary Distribution

In certain cases the solution of an SDE admits a stationary distribution, but in general this distribu-tion is not available in explicit form. Wong (1964) showed that, for model (12.1) in the univariatecase (p = d = 1) with σ(·) ≥ 0, if there exists a function f that solves the equation

Page 329: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 313

f (x)µ(x) = 1

2

d

dx{f (x)σ 2(x)}, (12.4)

and belongs to the Pearson family of distributions, that is, of the form

f (x) = λxaeb/x, x > 0, (12.5)

where a < −1 and b < 0, then (12.1) admits a stationary solution with density f .

Example 12.1 (Linear model) A linear SDE is an equation of the form

dXt = (ω + µXt)dt + σXtdWt, (12.6)

where ω,µ and σ are constants. For any initial value x0, this equation admits a strictly positivesolution if ω ≥ 0, x0 ≥ 0 and (ω, x0) = (0, 0) (Exercise 12.1). If f is assumed to be of the form(12.5), solving (12.4) leads to

a = −2(

1− µ

σ 2

), b = −2ω

σ 2.

Under the constraints

ω> 0, σ = 0, ζ := 1− 2µ

σ 2> 0,

we obtain the stationary density

f (x) = 1

�(ζ )

(2ω

σ 2

)ζexp

(−2ω

σ 2x

)x−1−ζ , x > 0,

where � denotes the gamma function. If this distribution is chosen for the initial distribution (thatis, the law of X0), then the process (Xt) is stationary and its inverse follows a gamma distribution,1

1

Xt

∼ �

(2ω

σ 2, ζ

). (12.7)

12.1.2 Convergence of Markov Chains to Diffusions

Consider a Markov chain Z(τ) = (Z(τ)kτ )k∈N with values in Rd , indexed by the time unit τ > 0. We

transform Z(τ) into a continuous-time process, (Z(τ)t )t∈R+ , by means of the time interpolation

Z(τ)t = Z

(τ)kτ if kτ ≤ t < (k + 1)τ.

Under conditions given in the next theorem, the process (Z(τ)t ) converges in distribution to a

diffusion. Denote by ‖ · ‖ the Euclidean norm on Rd .

1 The �(a, b) density, for a, b > 0, is defined on R+ by

f (x) = ab

�(b)e−axxb−1.

Page 330: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

314 GARCH MODELS

Theorem 12.1 (Convergence of (Z(τ)t ) to a diffusion) Suppose there exist continuous applica-

tions µ and σ from Rd to Rd and Mp×d respectively, such that for all r > 0 and for some δ > 0,

limτ→0

sup‖z‖≤r

∣∣∣τ−1E(Z(τ)(k+1)τ − Z

(τ)kτ | Z(τ)

kτ = z)− µ(z)

∣∣∣ = 0, (12.8)

limτ→0

sup‖z‖≤r

∣∣∣τ−1Var(Z(τ)(k+1)τ − Z

(τ)kτ | Z(τ)

kτ = z)− σ(z)σ (z)′

∣∣∣ = 0, (12.9)

limτ→0 sup‖z‖≤r

τ−(2+δ)/2E(‖Z(τ)

(k+1)τ − Z(τ)kτ ‖2+δ | Z(τ)

kτ = z)<∞. (12.10)

Then, if the equation

dZt = µ(Zt )dt + σ(Zt )dWt , 0 ≤ t <∞, Z0 = z0, (12.11)

admits a solution (Zt ) which is unique in distribution, and if Z(τ)0 converges in distribution to z0,

then the process (Z(τ)t ) converges in distribution to (Zt ).

Remark 12.1 Condition (12.10) ensures, in particular, that by applying the Markov inequality,for all ε > 0,

limτ→0

τ−1P(∥∥∥Z(τ)

(k+1)τ − Z(τ)kτ

∥∥∥>ε | Z(τ)kτ = z

)= 0.

As a consequence, the limiting process has continuous paths.

Euler Discretization of a Diffusion

Diffusion processes do not admit an exact discretization in general. An exception is the geometricBrownian motion , defined as a solution of the real SDE

dXt = µXtdt + σXtdWt (12.12)

where µ and σ are constants. It can be shown that if the initial value x0 is strictly positive, thenXt ∈ (0,∞) for any t > 0. By Ito’s lemma,2 we obtain

d log(Xt ) =(µ− σ 2

2

)dt + σdWt (12.13)

and then, by integration of this equation between times kτ and (k + 1)τ , we get the discretizedversion of model (12.12),

logX(k+1)τ = logXkτ +(µ− σ 2

2

)τ +√τσε(k+1)τ , (εkτ )

iid∼ N (0, 1). (12.14)

For general diffusions, an explicit discretized model does not exist but a natural approximation,called the Euler discretization , is obtained by replacing the differential elements by increments.The Euler discretization of the SDE (12.1) is then given, for the time unit τ , by

2 For Yt = f (t,Xt ) where f : (t, x) ∈ [0, T ]× R �→ f (t, x) ∈ R is continuous, continuously differentiablewith respect to the first component and twice continuously differentiable with respect to the second component,if (Xt ) satisfies dXt = µtdt + σtdWt where µt and σt are adapted processes such that a.s.

∫ T0 |µt |dt <∞

and∫ T

0 σ 2t dt <∞, we have

dYt = ∂f

∂t(t, Xt )dt + ∂f

∂x(t,Xt )dXt + 1

2

∂2f

∂x2(t, Xt )σ

2t dt.

Page 331: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 315

X(k+1)τ = Xkτ + τµ(Xkτ )+√τσ (Xkτ )ε(k+1)τ , (εkτ )

iid∼ N (0, 1). (12.15)

The Euler discretization of a diffusion converges in distribution to this diffusion (Exercise 12.2).

Convergence of GARCH-M Processes to Diffusions

It is natural to assume that the return of a financial asset increases with risk. Economic agents whoare risk-averse must receive compensation when they own risky assets. ARMA-GARCH type timeseries models do not take this requirement into account because the conditional mean and varianceare modeled separately. A simple way to model the dependence between the average return andrisk is to specify the conditional mean of the returns in the form

µt = ξ + λσt ,

where ξ and λ are parameters. By doing so we obtain, when σ 2t is specified as an ARCH, a

particular case of the ARCH in mean (ARCH-M) model, introduced by Engle, Lilien and Robins(1987). The parameter λ can be interpreted as the price of risk and can thus be assumed to bepositive. Other specifications of the conditional mean are obviously possible. In this section wefocus on a GARCH(1, 1)-M model of the form{

Xt = Xt−1 + f (σt )+ σtηt , (ηt ) iid (0, 1),g(σt ) = ω + a(ηt−1)g(σt−1),

(12.16)

where ω> 0, f is a continuous function from R+ to R, g is a continuous one-to-one mapfrom R+ to itself and a is a positive function. The previous interpretation implies that f isincreasing, but at this point it is not necessary to make this assumption. When g(x) = x2 anda(x) = αx2 + β with α ≥ 0, β ≥ 0, we get the classical GARCH(1, 1) model. Asymmetric effectscan be introduced, for instance by taking g(x) = x and a(x) = α+x+ − α−x− + β, with x+ =max(x, 0), x− = min(x, 0), α+ ≥ 0, α− ≥ 0, β ≥ 0.

Observe that the constraint for the existence of a strictly stationary and nonanticipative solution(Yt ), with Yt = Xt −Xt−1, is written as

E log{a(ηt )} < 0,

by the techniques studied in Chapter 2.Now, in view of the Euler discretization (12.15), we introduce the sequence of models indexed

by the time unit τ , defined by{X(τ)kτ = X

(τ)(k−1)τ + f (σ

(τ)kτ )τ +

√τσkτ η

(τ)kτ , (η

(τ)kτ ) iid (0, 1)

g(σ(τ)(k+1)τ ) = ωτ + aτ (η

(τ)kτ )g(σ

(τ)kτ ),

(12.17)

for k > 0, with initial values X(τ)0 = x0, σ (τ)0 = σ0 > 0 and assuming E(η

(τ)kτ )

4 <∞. The intro-duction of a delay k in the second equation is due to the fact that σ(k+1)τ belongs to the σ -fieldgenerated by ηkτ and its past values.

Noting that the pair Zkτ = (X(k−1)τ , g(σkτ )) defines a Markov chain, we obtain its limitingdistribution by application of Theorem 12.1. We have, for z = (x, g(σ )),

τ−1E(X(τ)kτ −X

(τ)(k−1)τ | Z(τ)

(k−1)τ = z) = f (σ ), (12.18)

τ−1E(g(σ(τ)(k+1)τ )− g(σ

(τ)kτ ) | Z(τ)

(k−1)τ = z) = τ−1ωτ + τ−1{Eaτ (η(τ)kτ )− 1}g(σ ).(12.19)

Page 332: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

316 GARCH MODELS

This latter quantity converges if

limτ→0

τ−1ωτ = ω, limτ→0

τ−1{Eaτ (η(τ)kτ )− 1} = −δ, (12.20)

where ω and δ are constants.Similarly,

τ−1Var[X(τ)kτ −X

(τ)(k−1)τ | Z(τ)

(k−1)τ = z] = σ 2, (12.21)

τ−1Var[g(σ (τ)(k+1)τ )− g(σ(τ)kτ ) | Z(τ)

(k−1)τ = z] = τ−1Var{aτ (η(τ)kτ )}g2(σ ), (12.22)

which converges if and only iflimτ→0

τ−1Var{aτ (η(τ)kτ )} = ζ, (12.23)

where ζ is a positive constant. Finally,

τ−1Cov[X(τ)kτ −X

(τ)(k−1)τ , g(σ

(τ)(k+1)τ )− g(σ

(τ)kτ ) | Z(τ)

(k−1)τ = z]

= τ−1/2Cov{η(τ)kτ , aτ (η(τ)kτ )}σg(σ ), (12.24)

converges if and only iflimτ→0

τ−1/2Cov{η(τ)kτ , aτ (η(τ)kτ )} = ρ, (12.25)

where ρ is a constant such that ρ2 ≤ ζ .Under these conditions we thus have

limτ→0

τ−1Var[Z(τ)kτ − Z

(τ)(k−1)τ | Z(τ)

(k−1)τ = z] = A(σ) =(

σ 2 ρσg(σ )

ρσg(σ ) ζg2(σ )

).

Moreover, we have

A(σ) = B(σ)B ′(σ ), where B(σ) =(

σ 0ρg(σ )

√ζ − ρ2g(σ )

).

We are now in a position to state our next result.

Theorem 12.2 (Convergence of (X(τ)t , g(σ

(τ)t )) to a diffusion) Under Conditions (12.20),

(12.23) and (12.25), and if, for δ > 0,

limτ→0τ−(1+δ)E{aτ (η(τ)kτ )− 1}2(1+δ) <∞, (12.26)

the limiting process when τ → 0, in the sense of Theorem 12.1, of the sequence of solutions ofmodels (12.17) is the bivariate diffusion{

dXt = f (σt )dt + σtdW1t

dg(σt ) = {ω − δg(σt )}dt + g(σt )(ρdW 1

t +√ζ − ρ2dW 2

t

),

(12.27)

where (W 1t ) and (W 2

t ) are independent Brownian motions, with initial values x0 and σ0.

Proof. It suffices to verify the conditions of Theorem 12.1. It is immediate from (12.18), (12.19),(12.21), (12.22), (12.24) and the hypotheses on f and g, that, in (12.8) and (12.9), the limits areuniform on every ball of R2. Moreover, for τ ≤ τ0 and δ < 2, we have

τ−2+δ

2 E

(∣∣∣X(τ)kτ −X

(τ)(k−1)τ

∣∣∣2+δ | Z(τ)kτ = z

)= τ−

2+δ2 E

(∣∣∣f (σ)τ +√τση(τ)0

∣∣∣2+δ)≤ E

(|f (σ )|√τ0 + σ |η(τ)0 |

)2+δ,

Page 333: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 317

which is bounded uniformly in σ on every compact. On the other hand, introducing the L2+δ normand using the triangle and Holder’s inequalities,

τ−2+δ

2 E

(∣∣∣g(σ (τ)(k+1)τ )− g(σ(τ)kτ )

∣∣∣2+δ | Z(τ)kτ = z

)= E

∣∣∣τ−1/2ωτ + τ−1/2{aτ (η(τ)kτ )− 1}g(σ )∣∣∣2+δ

=∥∥∥τ−1/2ωτ + τ−1/2{aτ (η(τ)kτ )− 1}g(σ )

∥∥∥2+δ2+δ

≤(τ−1/2ωτ + τ−1/2

∥∥∥aτ (η(τ)kτ )− 1∥∥∥

2+δg(σ )

)2+δ

≤{τ−1/2ωτ + τ−1/2

(E|aτ (η(τ)kτ )− 1|2E|aτ (η(τ)kτ )− 1|2(1+δ)

) 12(2+δ)

g(σ )

}2+δ.

Since the limit superior of this quantity, when τ → 0, is bounded uniformly in σ on every compact,we can conclude that condition (12.10) is satisfied.

It remains to show that the SDE (12.27) admits a unique solution. Note that g(σt ) satisfies alinear SDE given by

dg(σt ) = {ω − δg(σt )}dt +√ζg(σt )dW

3t , (12.28)

where (W 3t ) is the Brownian motion W 3

t = (ρW 1t +

√ζ − ρ2W 2

t )/√ζ . This equation admits a

unique solution (Exercise 12.1)

g(σt ) = Yt

(g(σ0)+ ω

∫ t

0

1

Ysds

), where Yt = exp{−(δ + ζ/2)t +

√ζW 3

t }.

The function g being one-to-one, we deduce σt and the solution (Xt ), uniquely obtained as

Xt = x0 +∫ t

0f (σs)ds +

∫ t

0σsdW

1s . �

Remark 12.2

1. It is interesting to note that the limiting diffusion involves two Brownian motions, whereasGARCH processes involve only one noise. This can be explained by the fact that, to obtain aMarkov chain, it is necessary to consider the pair (X(k−1)τ , g(σkτ )). The Brownian motionsinvolved in the equations of Xt and g(σt ) are independent if and only if ρ = 0. This is,for instance, the case when the function aτ is even and the distribution of the iid processis symmetric.

2. Equation (12.28) shows that g(σt ) is the solution of a linear model of the form (12.6). Fromthe study of this model, we know that under the constraints

ω> 0, ζ > 0, 1+ 2δ

ζ> 0,

there exists a stationary distribution for g(σt ). If the process is initialized with this distri-bution, then

1

g(σt )∼ �

(2ω

ζ, 1+ 2δ

ζ

). (12.29)

Page 334: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

318 GARCH MODELS

Example 12.2 (GARCH(1, 1)) The volatility of the GARCH(1, 1) model is obtained for g(x) =x2, a(x) = αx2 + β. Suppose for simplicity that the distribution of the process (η

(τ)kτ ) does not

depend on τ and admits moments of order 4(1+ δ), for δ > 0. Denote by µr the rth-order momentof this process. Conditions (12.20) and (12.23) take the form

limτ→0

τ−1ωτ = ω, limτ→0

τ−1(µ4 − 1)α2τ = ζ, lim

τ→0τ−1(ατ + βτ − 1) = −δ.

A choice of parameters satisfying these constraints is, for instance,

ωτ = ωτ, ατ =√

ζ τ

µ4 − 1, βτ = 1− ατ − δτ.

Condition (12.25) is then automatically satisfied with

ρ =√ζ

µ3√µ4 − 1

,

as well as (12.26). The limiting diffusion takes the form dXt = f (σt )dt + σtdW1t

dσ 2t = {ω − δσ 2

t }dt +√

ζ

µ4−1σ2t

(µ3dW

1t +

√µ4 − 1− µ2

3dW2t

)

and, if the law of η(τ)kτ is symmetric,{dXt = f (σt )dt + σtdW

1t

dσ 2t = {ω − δσ 2

t }dt +√ζσ 2

t dW2t .

(12.30)

Note that, with other choices of the rates of convergence of the parameters, we can obtain a limitingprocess involving only one Brownian motion but with a degenerate volatility equation, in the sensethat it is an ordinary differential equation (Exercise 12.3).

Example 12.3 (TGARCH(1, 1)) For g(x) = x, a(x) = α+x+ − α−x− + β, we have the volatil-ity of the threshold GARCH(1, 1) model. Under the assumptions of the previous example, letµr+ = E(η+0 )

r and µr− = E(−η−0 )r . Conditions (12.20) and (12.23) take the form

limτ→0

τ−1ωτ = ω, limτ→0

τ−1{α2τ+µ2+ + α2

τ−µ2− − (ατ+µ1+ + ατ−µ1−)2} = ζ,

limτ→0

τ−1(ατ+µ1+ + ατ−µ1− + βτ − 1) = −δ.

These constraints are satisfied by taking, for instance, ατ+ = √τα+, ατ− = √τα− and

ωτ = ωτ, α2+µ2+ + α2

−µ2− − (α+µ1+ + α−µ1−)2 = ζ, βτ = 1− ατ+µ1+ − ατ−µ1− − δτ.

Condition (12.25) is then satisfied with ρ = α+µ2+ − α−µ2−, as well as condition (12.26). Thelimiting diffusion takes the form{

dXt = f (σt )dt + σtdW1t

dσt = {ω − δσt }dt + σt

(ρdW 1

t +√ζ − ρ2dW 2

t

).

Page 335: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 319

In particular, if the law of η(τ)kτ is symmetric and if ατ+ = ατ−, the correlation between the Brownianmotions of the two equations vanishes and we get a limiting diffusion of the form{

dXt = f (σt )dt + σtdW1t

dσt = {ω − δσt }dt +√ζσtdW

2t .

By applying the Ito lemma to this equation, we obtain

dσ 2t = 2{ω − δσt }σtdt + 2

√ζσ 2

t dW2t ,

which shows that, even in the symmetric case, the limiting diffusion does not coincide with thatobtained in (12.30) for the classical GARCH model. When the law of η

(τ)kτ is symmetric and

ατ+ = ατ−, the asymmetry in the discrete-time model results in a correlation between the twoBrownian motions of the limiting diffusion.

12.2 Option Pricing

Classical option pricing models rely on independent Gaussian returns processes. These assumptionsare incompatible with the empirical properties of prices, as we saw, in particular, in the introductorychapter. It is thus natural to consider pricing models founded on more realistic, GARCH-type, orstochastic volatility, price dynamics.

We start by briefly recalling the terminology and basic concepts related to the Black–Scholesmodel. Appropriate financial references are provided at the end of this chapter.

12.2.1 Derivatives and OptionsThe need to hedge against several types of risk gave rise to a number of financial assets calledderivatives . A derivative (derivative security or contingent claim) is a financial asset whose payoffdepends on the price process of an underlying asset : action, portfolio, stock index, currency, etc.The definition of this payoff is settled in a contract.

There are two basic types of option. A call option (put option) or more simply a call (put) isa derivative giving to the holder the right, but not the obligation, to buy (sell) an agreed quantityof the underlying asset S, from the seller of the option on (or before) the expiration date T , for aspecified price K , the strike price or exercise price. The seller (or ‘writer’) of a call is obliged to sellthe underlying asset should the buyer so decide. The buyer pays a fee, called a premium , for thisright. The most common options, since their introduction in 1973, are the European options , whichcan be exercised only at the option expiry date, and the American options , which can be exercisedat any time during the life of the option. For a European call option, the buyer receives, at the expirydate, the amount max(ST −K, 0) = (ST −K)+ since the option will not be exercised unless it is‘in the money’. Similarly, for a put, the payoff at time T is (K − ST )

+. Asset pricing involvesdetermining the option price at time t . In what follows, we shall only consider European options.

12.2.2 The Black–Scholes ApproachConsider a market with two assets, an underlying asset and a risk-free asset. The Black and Scholes(1973) model assumes that the price of the underlying asset is driven by a geometric Brownianmotion

dSt = µStdt + σStdWt, (12.31)

Page 336: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

320 GARCH MODELS

where µ and σ are constants and (Wt ) is a standard Brownian motion. The risk-free interest rater is assumed to be constant. By Ito’s lemma, we obtain

d log(St ) =(µ− σ 2

2

)dt + σdWt, (12.32)

showing that the logarithm of the price follows a generalized Brownian motion, with drift µ−σ 2/2 and constant volatility. Integrating (12.32) between times t − 1 and t yields the discretizedversion

log

(St

St−1

)= µ− σ 2

2+ σεt , εt

iid∼ N (0, 1). (12.33)

The assumption of constant volatility is obviously unrealistic. However, this model allows forexplicit formulas for option prices, or more generally for any derivative based on the underlyingasset S, with payoff g(ST ) at the expiry date T . The price of this product at time t is uniqueunder certain regularity conditions3 and is denoted by C(S, t) for simplicity. The set of conditionsensuring the uniqueness of the derivative price is referred to as the complete market hypothesis . Inparticular, these conditions imply the absence of arbitrage opportunities , that is, that there is no‘free lunch’. It can be shown4 that the derivative price is

C(S, t) = e−r(T−t)Eπ [g(ST ) | St ], (12.34)

where the expectation is computed under the probability π corresponding to the equation

dSt = rStdt + σStdW∗t , (12.35)

where (W ∗t ) denotes a standard Brownian motion. The probability π is called the risk-neutral

probability, because under π the expected return of the underlying asset is the risk-free interestrate r . It is important to distinguish this from the historic probability , that is, the law under whichthe data are generated (here defined by model (12.31)). Under the risk-neutral probability, the priceprocess is still a geometric Brownian motion, with the same volatility σ but with drift r . Notethat the initial drift term, µ, does not play a role in the pricing formula (12.34). Moreover, theactualized price Xt = e−rt St satisfies dXt = σXtdW

∗t . This implies that the actualized price is

a martingale for the risk-neutral probability: e−r(T−t)Eπ [ST | St ] = St . Note that this formula isobvious in view of (12.34), by considering the underlying asset as a product with payoff ST attime T .

The Black–Scholes formula is an explicit version of (12.34) when the derivative is a call, thatis, when g(ST ) = (K − ST )

+, given by

C(S, t) = St�(xt + σ√τ)− e−rτK�(xt ), (12.36)

where � is the conditional distribution function (cdf) of the N (0, 1) distribution and

τ = T − t, xt = log(St /e−rτK)σ√τ

− 1

2σ√τ .

3 These conditions are the absence of transaction costs, the possibility of constructing a portfolio with anyallocations (sign and size) of the two assets, the possibility of continuously adjusting the composition of theportfolio and the existence of a price for the derivative depending only on the present and past values of St .

4 The three classical methods for proving this formula are the method based on the binomial model (Cox,Ross and Rubinstein, 1979), the method based on the resolution of equations with partial derivatives and themethod based on the martingale theory.

Page 337: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 321

In particular, it can be seen that if St is large compared to K , we have �(xt + σ√τ) ≈ �(xt ) ≈ 1

and the call price is approximately given by St − e−rτK , that is, the current underlying priceminus the actualized exercise price. The price of a put P(S, t) follows from the put–call parityrelationship (Exercise 12.4): C(S, t) = P(S, t)+ St − e−rτK .

A simple computation (Exercise 12.5) shows that the European call option price is an increasingfunction of St , which is intuitive. The derivative of C(S, t) with respect to St , called delta , is used inthe construction of a riskless hedge, a portfolio obtained from the risk-free and risky assets allowingthe seller of a call to cover the risk of a loss when the option is exercised. The construction of ariskless hedge is often referred to as delta hedging .

The previous approach can be extended to other price processes, in particular if (St ) is solutionof a SDE of the form

dSt = µ(t, St )dt + σ(t, St )dWt

under regularity assumptions on µ and σ . When the geometric Brownian motion for St is replacedby another dynamics, the complete market property is generally lost.5

12.2.3 Historic Volatility and Implied VolatilitiesNote that, from a statistical point of view, the sole unknown parameter in the Black–Scholespricing formula (12.36) is the volatility of the underlying asset. Assuming that the prices follow ageometric Brownian motion, application of this formula thus requires estimating σ . Any estimateof σ based on a history of prices S0, . . . , Sn is referred to as historic volatility . For geometricBrownian motion, the log-returns log(St /St−1) are, by (12.32), iid N (µ− σ 2/2, σ 2) distributedvariables. Several estimation methods for σ can be considered, such as the method of momentsand the maximum likelihood method (Exercise 12.7). An estimator of C(S, t) is then obtained byreplacing σ by its estimate.

Another approach involves using option prices. In practice, traders usually work with theso-called implied volatilities . These are the volatilities implied by option prices observed in themarket. Consider a European call option whose price at time t is Ct . If St denotes the price of theunderlying asset at time t , an implied volatility σ It is defined by solving the equation

Ct = St�(xt + σIt√τ)− e−rτK�(xt ), xt = log(St /e−rτK)

σ It√τ

− 1

2σ It√τ .

This equation cannot be solved analytically and numerical procedures are called for. Note that thesolution is unique because the call price is an increasing function of σ (Exercise 12.8).

If the assumptions of the Black–Scholes model, that is, the geometric Brownian motion, aresatisfied, implied volatilities calculated from options with different characteristics but the sameunderlying asset should coincide with the theoretical volatility σ . In practice, implied volatilitiescalculated with different strikes or expiration dates are very unstable, which is not surprising sincewe know that the geometric Brownian motion is a misspecified model.

12.2.4 Option Pricing when the Underlying Process is a GARCHIn discrete time, with time unit δ, the binomial model (in which, given St , St+δ can only take twovalues) allows us to define a unique risk-neutral probability, under which the actualized price is

5 This is the case for the stochastic volatility model of Hull and White (1987), defined by{dSt = φStdt + σtSt dWt

dσ 2t = µσ 2

t dt + ξσ 2t dW

∗t ,

where Wt,W∗t are independent Brownian motions.

Page 338: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

322 GARCH MODELS

a martingale. This model is used, in the Cox, Ross and Rubinstein (1979) approach, as an analogin discrete time of the geometric Brownian motion. Intuitively, the assumption of a completemarket is satisfied (in the binomial model as well as in the Black–Scholes model) because thenumber of assets, two, coincides with the number of states of the world at each date. Apart fromthis simple situation, the complete market property is generally lost in discrete time. It followsthat a multiplicity of probability measures may exist, under which the prices are martingales, andconsequently, a multiplicity of pricing formulas such as (12.34). Roughly speaking, there is toomuch variability in prices between consecutive dates.

To determine options prices in incomplete markets, additional assumptions can be made onthe risk premium and/or the preferences of the agents. A modern alternative relies on the conceptof stochastic discount factor, which allows pricing formulas in discrete time similar to those incontinuous time to be obtained.

Stochastic Discount Factor

We start by considering a general setting. Suppose that we observe a vector process Z = (Zt ) andlet It denote the information available at time t , that is, the σ -field generated by {Zs, s ≤ t}. Weare interested in the pricing of a derivative whose payoff is g = g(ZT ) at time T . Suppose thatthere exists, at time t < T , a price Ct(Z, g, T ) for this asset. It can be shown that, under mildassumptions on the function g �→ Ct (Z, g, T ),6 we have the representation

C(g, t, T ) = E[g(ZT )Mt,T | It ], where Mt,T > 0, Mt,T ∈ IT . (12.37)

The variable Mt,T is called the stochastic discount factor (SDF) for the period [t, T ]. The SDFintroduced in representation (12.37) is not unique and can be parameterized. The formula applies,in particular, to the zero-coupon bond of expiry date T , defined as the asset with payoff 1 at timeT . Its price at t is the conditional expectation of the SDF,

B(t, T ) = C(1, t, T ) = E[Mt,T | It ].

It follows that (12.37) can be written as

C(g, t, T ) = B(t, T )E

[g(ZT )

Mt,T

B(t, T )| It]. (12.38)

Forward Risk-Neutral Probability

Observe that the ratio Mt,T /B(t, T ) is positive and that its mean, conditional on It , is 1. Conse-quently, a probability change removing this factor in formula (12.38) can be done.7 Denoting byπt,T the new probability and by Eπt,T the expectation under this probability, we obtain the pricingformula

C(g, t, T ) = B(t, T )Eπt,T [g(ZT ) | It ]. (12.39)

The probability law πt,T is called forward risk-neutral probability. Note the analogy between thisformula and (12.34), the latter corresponding to a particular form of B(t, T ). To make this formulaoperational, it remains to specify the SDF.

6 In particular, linearity and positivity (see Hansen and Richard, 1987; Gourieroux and Tiomo, 2007).7 If X is a random variable with distribution PX and G is a function with real positive values such that

E{G(X)} = 1, we have, interpreting G as the density of some measure P ∗X with respect to PX ,

E{XG(X)} =∫xG(x)dPX(x) =

∫xdP ∗X(x) =: E∗(X).

Page 339: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 323

Risk-Neutral Probability

As mentioned earlier, the SDF is not unique in incomplete markets. A natural restriction, referredto as a temporal coherence restriction, is given by

Mt,T = Mt,t+1Mt+1,t+2 . . .MT−1,T . (12.40)

On the other hand, the one-step SDFs are constrained by

B(t, t + 1) = E[Mt,t+1 | It ], St = E[St+1Mt,t+1 | It ], (12.41)

where St ∈ It is the price of an underlying asset (or a vector of assets). We have

C(g, t, T ) = B(t, t + 1)E

[g(ZT )

T−1∏i=t+1

B(i, i + 1)T−1∏i=t

Mi,i+1

B(i, i + 1)| It]. (12.42)

Noting that

E

[T−1∏i=t

Mi,i+1

B(i, i + 1)| It]= E

[T−2∏i=t

Mi,i+1

B(i, i + 1)E

(MT−1,T

B(T − 1, T )| IT−1

)| It]

= 1,

we can make a change of probability such that the SDF vanishes. Under a probability law π∗t,T ,called risk-neutral probability, we thus have

C(g, t, T ) = B(t, t + 1)Eπ∗t,T

[g(ZT )

T−1∏i=t+1

B(i, i + 1) | It]. (12.43)

The risk-neutral probability satisfies the temporal coherence property: π∗t,T+1 is related to π∗t,Tthrough the factor MT,T+1/B(T , T + 1). Without (12.40), the risk-neutral forward probabilitydoes not satisfy this property.

Pricing Formulas

One approach to deriving pricing formulas is to specify, parametrically, the dynamics of Zt andof Mt,t+1, taking (12.41) into account.

Example 12.4 (Black–Scholes model) Consider model (12.33) with Zt = log (St/St−1) andsuppose that B(t, t + 1) = e−r . A simple specification of the one-step SDF is given by the affinemodel Mt,t+1 = exp(a + bZt+1), where a and b are constants. The constraints in (12.41) arewritten as

e−r = E exp(a + bZt+1), 1 = E exp{a + (b + 1)Zt+1},that is, in view of the N (µ− σ 2/2, σ 2) distribution of Zt ,

0 = a + r + b

(µ− σ 2

2

)+ b2σ 2

2= a + (b + 1)

(µ− σ 2

2

)+ (b + 1)2σ 2

2.

These equations provide a unique solution (a, b). We then obtain the risk-neutral probabilityπt,t+1 = π through the characteristic function

Eπ(euZt+1) = E

(euZt+1

Mt,t+1

B(t, t + 1)

)= E

(ea+r+(b+u)Zt+1

) = eu(r−σ2/2)+u2σ 2/2.

Page 340: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

324 GARCH MODELS

Because the latter is the characteristic function of the N(r − σ 2/2, σ 2) distribution, we retrieve thegeometric Brownian motion model (12.35) for (St ). The Black–Scholes formula is thus obtainedby specifying an affine exponential SDF with constant coefficients.

Now consider a general GARCH-type model of the form{Zt = log (St/St−1) = µt + εt ,

εt = σtηt , (ηt )iid∼ N (0, 1),

(12.44)

where µt and σt belong to the σ -field generated by the past of Zt , with σt > 0. Suppose that theσ -fields generated by the past of εt , Zt and ηt are the same, and denote this σ -field by It−1.Suppose again that B(t, t + 1) = e−r . Consider for the SDF an affine exponential specificationwith random coefficients, given by

Mt,t+1 = exp(at + btηt+1), (12.45)

where at , bt ∈ It . The constraints (12.41) are written as

e−r = E exp(at + btηt+1 | It ),1 = E exp{at + btηt+1 + Zt+1 | It } = E exp{at + µt+1 + (bt + σt+1)ηt+1 | It },

that is, after simplification,

at = −r − b2t

2, btσt+1 = r − µt+1 −

σ 2t+1

2. (12.46)

As before, these equations provide a unique solution (at , bt ). The risk-neutral probability πt,t+1 isdefined through the characteristic function

Eπt,t+1(euZt+1 | It ) = E(euZt+1Mt,t+1/B(t, t + 1) | It

)= E

(eat+r+uµt+1+(bt+uσt+1)ηt+1 | It

)= exp

(u(µt+1 + btσt+1)+ u2 σ

2t+1

2

)

= exp

(u

(r − σ 2

t+1

2

)+ u2 σ

2t+1

2

).

The last two equalities are obtained by taking into account the constraints on at and bt . Thus,under the probability πt,t+1, the law of the process (Zt ) is given by the model{

Zt = r − σ2t

2 + ε∗t ,

ε∗t = σtη∗t , (η∗t )

iid∼ N (0, 1).(12.47)

The independence of the η∗t follows from the independence between η∗t+1 and It (because η∗t+1has a fixed distribution conditional on It ) and from the fact that η∗t = σ−1

t (Zt − r + σ 2t /2) ∈ It .

The model under the risk-neutral probability is then a GARCH-type model if the variable σ 2t is a

measurable function of the past of ε∗t . This generally does not hold because the relation

ε∗t = µt − r + σ 2t

2+ εt (12.48)

entails that the past of ε∗t is included in the past of εt , but not the reverse.

Page 341: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 325

If the relation (12.48) is invertible, in the sense that there exists a measurable function f suchthat εt = f (ε∗t , ε

∗t−1, . . .), model (12.47) is of the GARCH type, but the volatility σ 2

t can take avery complicated form as a function of the ε∗t−j . Specifically, if the volatility under the historicprobability is that of a classical GARCH(1, 1), we have

σ 2t = ω + α

(r − σ 2

t−1

2− µt−1 + ε∗t−1

)2

+ βσ 2t−1.

Finally, using πt,T , the forward risk-neutral probability for the expiry date T , the price of aderivative is given by the formula

Ct(Z, g, T ) = e−r(T−t)Eπt,T [g(ZT ) | St ] (12.49)

or, under the historic probability, in view of (12.40),

Ct (Z, g, T ) = E[g(ZT )Mt,t+1Mt+1,t+2 . . .MT−1,T | St ]. (12.50)

It is important to note that, with the affine exponential specification of the SDF, the volatilitiescoincide with the two probability measures. This will not be the case for other SDFs (Exercise12.11).

Example 12.5 (Constant coefficient SDF) We have seen that the Black–Scholes formula relieson (i) a Gaussian marginal distribution for the log-returns, and (ii) an affine exponential SDF withconstant coefficients. In the framework of model (12.44), it is thus natural to look for a SDF ofthe same type, with at and bt independent of t . It is immediate from (12.46) that it is necessaryand sufficient to take µt of the form

µt = µ+ λσt − σ 2t

2,

where µ and λ are constants. We thus obtain a model of GARCH in mean type, because theconditional mean is a function of the conditional variance. The volatility in the risk-neutral modelis thus written as

σ 2t = ω + α

{(r − µ)− λσt−1 + ε∗t−1

}2 + βσ 2t−1.

If, moreover, r = µ then under the historic probability the model is expressed aslog (St /St−1) = r + λσt − σ 2

t

2 + εt ,

εt = σtηt , (ηt )iid∼ N (0, 1),

σ 2t = ω + αε2

t−1 + βσ 2t−1, ω > 0, α, β ≥ 0,

(12.51)

and under the risk-neutral probability,log (St /St−1) = r − σ 2

t

2 + ε∗t ,

ε∗t = σtη∗t , (η∗t )

iid∼ N (0, 1),σ 2t = ω + α(ε∗t−1 − λσt−1)

2 + βσ 2t−1.

(12.52)

Under the latter probability, the actualized price e−rt St is a martingale (Exercise 12.9). Note thatin (12.52) the coefficient λ appears in the conditional variance, but the risk has been neutralized(the conditional mean no longer depends on σ 2

t ). This risk-neutral probability was obtained byDuan (1995) using a different approach based on the utility of the representative agent.

Page 342: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

326 GARCH MODELS

Note that σ 2t = ω + σ 2

t−1

{α(λ− η∗t−1)

2 + β}. Under the strict stationarity constraint

E log{α(λ− η∗t )2 + β} < 0,

the variable σ 2t is a function of the past of η∗t and can be interpreted as the volatility of Zt under

the risk-neutral probability. Under this probability, the model is not a classical GARCH(1, 1)unless λ = 0, but in this case the risk-neutral probability coincides with the historic one, whichis not the case in practice.

Numerical Pricing of Option Prices

Explicit computation of the expectation involved in (12.49) is not possible, but the expectation canbe evaluated by simulation. Note that, under πt,T , ST and St are linked by the formula

ST = St exp

{(T − t)r − 1

2

T∑s=t+1

hs +T∑

s=t+1

√hsνs

},

where hs = σ 2s . At time t , suppose that an estimate θ = (λ, ω, α, β) of the coefficients of model

(12.51) is available, obtained from observations S1, . . . , St of the underlying asset. Simulated valuesS(i)T of ST , and thus simulated values Z(i)

T of ZT , for i = 1, . . . , N , are obtained by simulating, atstep i, T − t independent realizations ν(i)s of the N (0, 1) and by setting

S(i)T = St exp

{(T − t)r − 1

2

T∑s=t+1

h(i)s +T∑

s=t+1

√h(i)s ν(i)s

},

where the h(i)s , s = t + 1, . . . , T , are recursively computed from

h(i)s = ω +{α(ν

(i)s−1 − λ)2 + β

}h(i)s−1

taking, for instance, as initial value h(i)t = σ 2

t , the volatility estimated from the initial GARCHmodel (this volatility being computed recursively, and the effect of the initialization being negligiblefor t large enough under stationarity assumptions). This choice can be justified by noting that forSDFs of the form (12.45), the volatilities coincide under the historic and risk-neutral probabilities.Finally, a simulation of the derivative price is obtained by taking

Ct (Z, g, T ) = e−r(T−t)1

N

N∑i=1

g(Z(i)T ).

The previous approach can obviously be extended to more general GARCH models, with largerorders and/or different volatility specifications. It can also be extended to other SDFs.

Empirical studies show that, comparing the computed prices with actual prices observed onthe market, GARCH option pricing provides much better results than the classical Black–Scholesapproach (see, for instance, Sabbatini and Linton, 1998; Hardle and Hafner, 2000).

To conclude this section, we observe that the formula providing the theoretical option pricescan also be used to estimate the parameters of the underlying process, using observed options (see,for instance, Hsieh and Ritchken, 2005).

Page 343: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 327

12.3 Value at Risk and Other Risk Measures

Risk measurement is becoming more and more important in the financial risk management ofbanks and other institutions involved in financial markets. The need to quantify risk typicallyarises when a financial institution has to determine the amount of capital to hold as a protectionagainst unexpected losses. In fact, risk measurement is concerned with all types of risks encounteredin finance. Market risk , the best-known type of risk, is the risk of change in the value of a financialposition. Credit risk , also a very important type of risk, is the risk of not receiving repaymentson outstanding loans, as a result borrower default. Operational risk , which has received more andmore attention in recent years, is the risk of losses resulting from failed internal processes, peopleand systems, or external events. Liquidity risk occurs when, due to a lack of marketability, aninvestment cannot be bought or sold quickly enough to prevent a loss. Model risk can be definedas the risk due to the use of a misspecified model for the risk measurement.

The need for risk measurement has increased dramatically, in the last two decades, due to theintroduction of new regulation procedures. In 1996 the Basel Committee on Banking Supervision (acommittee established by the central bank governors in 1974) prescribed a so-called standardizedmodel for market risk. At the same time the Committee allowed the larger financial institutions todevelop their own internal model . The second Basel Accord (Basel II), initiated in 2001, considersoperational risk as a new risk class and prescribes the use of finer approaches to assess the risk ofcredit portfolios. By using sophisticated approaches the banks may reduce the amount of regulatorycapital (the capital required to support the risk), but in the event of frequent losses a larger amountmay be imposed by the regulator. Parallel developments took place in the insurance sector, givingrise to the Solvency projects.

A risk measure that is used for specifying capital requirements can be thought of as the amountof capital that must be added to a position to make its risk acceptable to regulators. Value at risk(VaR) is arguably the most widely used risk measure in financial institutions. In 1993, the businessbank JP Morgan publicized its estimation method, RiskMetrics , for the VaR of a portfolio. VaRis now an indispensable tool for banks, regulators and portfolio managers. Hundreds of academicand nonacademic papers on VaR may be found at http://www.gloriamundi.org/ We start bydefining VaR and discussing its properties.

12.3.1 Value at RiskDefinition

VaR is concerned with the possible loss of a portfolio in a given time horizon. A natural risk measureis the maximum possible loss. However, in most models, the support of the loss distribution isunbounded so that the maximum loss is infinite. The concept of VaR replaces the maximum lossby a maximum loss which is not exceeded with a given (high) probability .

VaR should be computed using the predictive distribution of future losses, that is, the condi-tional distribution of future losses using the current information. However, for horizons h> 1, thisconditional distribution may be hard to obtain.

To be more specific, consider a portfolio whose value at time t is a random variable denotedVt . At horizon h, the loss is denoted

Lt,t+h = −(Vt+h − Vt).

The distribution of Lt,t+h is called the loss distribution (conditional or not). This distribution isused to compute the regulatory capital which allows certain risks to be covered, but not all ofthem. In general, Vt is specified as a function of d unobservable risk factors .

Page 344: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

328 GARCH MODELS

Example 12.6 Suppose, for instance, that the portfolio is composed of d stocks. Denote by Si,tthe price of stock i at time t and by ri,t,t+h = log Si,t+h − log Si,t the log-return. If ai is the numberof stocks i in the portfolio, we have

Vt =d∑i=1

aiSi,t .

Assuming that the composition of the portfolio remains fixed between the dates t and t + h, wehave

Lt,t+h = −d∑i=1

aiSi,t (eri,t,t+h − 1).

The distribution of Vt+h conditional on the available information at time t is called the profitand loss (P&L) distribution.

The determination of reserves depends on

• the portfolio,

• the available information at time t and the horizon h,8

• a level α ∈ (0, 1) characterizing the acceptable risk.9

Denote by Rt,h(α) the level of the reserves. Including these reserves, which are not subject toremuneration, the value of the portfolio at time t + h becomes Vt+h + Rt,h(α). The capital usedto support risk, the VaR, also includes the current portfolio value,

VaRt,h(α) = Vt + Rt,h(α),

and satisfiesPt [Vt+h − Vt < −VaRt,h(α)] < α,

where Pt is the probability conditional on the information available at time t .10 VaR can thus beinterpreted as the capital exposed to risk in the event of bankruptcy. Equivalently,

Pt [VaRt,h(α) < Lt,t+h] < α, i.e. Pt [Lt,t+h ≤ VaRt,h(α)] ≥ 1− α. (12.53)

In probabilistic terms, VaRt,h(α) is thus simply the (1− α)-quantile of the conditional loss distri-bution. If, for instance, for a confidence level 99% and a horizon of 10 days, the VaR of a portfoliois ¤5000, this means that, if the composition of the portfolio does not change, there is a probabilityof 1% that the potential loss over 10 days will be larger than ¤5000.

Definition 12.1 The (1− α)-quantile of the conditional loss distribution is called the VaR at thelevel α:

VaRt,h(α) := inf{x ∈ R | Pt [Lt,t+h ≤ x] ≥ 1− α},

when this quantile is positive. By convention VaRt,h(α) = 0 otherwise.

In particular, it is obvious that VaRt,h(α) is a decreasing function of α.From (12.53), computing a VaR simply reduces to determining a quantile of the conditional

loss distribution. Figure 12.1 compares the VaR of three distributions, with the same variance but

8 For market risk management, h is typically 1 or 10 days. For the regulator (concerned with credit oroperational risk) h is 1 year.

9 1− α is often referred to as the confidence level. Typical values of α are 5% or 3%.10 In the ‘standard’ approach, the conditional distribution is replaced by the unconditional.

Page 345: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 329

−1 1 2 3−0.1

0.1

0.2

0.3

0.4

0.5

0.6

VaR

Loss distribution

α0.01 0.02 0.03 0.04 0.05

1.5

2

2.5

VaR

α

Figure 12.1 VaR is the (1− α)-quantile of the conditional loss distribution (left). The right-hand graph displays the VaR as a function of α ∈ [1%, 5%] for a Gaussian distribution N (solidline), a Student t distribution with 3 degrees of freedom S (dashed line) and a double exponentialdistribution E (thin dotted line). The three laws are standardized so as to have unit variances. Forα = 1% we have VaR(N) < VaR(S) < VaR(E), whereas for α = 5% we have VaR(S) < VaR(E) <VaR(N).

different tail thicknesses. The thickest tail, proportional to 1/x4, is that of the Student t distributionwith 3 degrees of freedom, here denoted S; the thinnest tail, proportional to e−x2/2, is that of theGaussian N; and the double exponential E possesses a tail of intermediate size, proportional toe−√

2|x|. For some very small level α, the VaRs are ranked in the order suggested by the thicknessof the tails: VaR(N) < VaR(E) < VaR(S). However, the right-hand graph of Figure 12.1 showsthat this ranking does not hold for the standard levels α = 1% or α = 5% .

VaR and Conditional Moments

Let us introduce the first two moments of Lt,t+h conditional on the information available at time t :

mt,t+h = Et(Lt,t+h), σ 2t,t+h = Vart (Lt,t+h).

Suppose thatLt,t+h = mt,t+h + σt,t+hL∗h, (12.54)

where L∗h is a random variable with cumulative distribution function Fh. Denote by F←h the quantilefunction of the variable L∗h, defined as the generalized inverse of Fh:

F←h (α) = inf{x ∈ R | Fh(x) ≥ α}, 0 < α < 1.

If Fh is continuous and strictly increasing, we simply have F←h (α) = F−1h (α), where F−1

h is theordinary inverse of Fh. In view of (12.53) and (12.54) it follows that

1− α = Pt [VaRt,h(α) ≥ mt,t+h + σt,t+hL∗] = Fh

(VaRt,h(α)−mt,t+h

σt,t+h

).

Consequently,VaRt,h(α) = mt,t+h + σt,t+hF←h (1− α). (12.55)

VaR can thus be decomposed into an ‘expected loss’ mt,t+h, the conditional mean of the loss, andan ‘unexpected loss’ σt,t+hF←(1− α), also called economic capital .

Page 346: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

330 GARCH MODELS

The apparent simplicity of formula (12.55) masks difficulties in (i) deriving the first conditionalmoments for a given model, and (ii) determining the law Fh, supposedly independent of t , of thestandardized loss at horizon h.

Consider the price of a portfolio, defined as a combination of the prices of d assets, pt = a′Pt ,where a, Pt ∈ Rd . Introducing the price variations �Pt = Pt − Pt−1, we have

Lt,t+h = −(pt+h − pt) = −a′(Pt+h − Pt) = −a′h∑i=1

�Pt+i .

The term structure of the VaR, that is, its evolution as a function of the horizon, can be analyzedin different cases.

Example 12.7 (Independent and identically distributed price variations) If the �Pt+i are iidN (m,) distributed, the law of Lt,t+h is N (−a′mh, a′ah). In view of (12.55), it follows that

VaRt,h(α) = −a′mh+√a′a

√h�−1(1− α). (12.56)

In particular, if m = 0, we haveVaRt,h(α) =

√hVaRt,1(α). (12.57)

The rule that one multiplies the VaR at horizon 1 by√h to obtain the VaR at horizon h is often

erroneously used when the prices variations are not iid, centered and Gaussian (Exercise 12.12).

Example 12.8 (AR(1) price variations) Suppose now that

�Pt −m = A(�Pt−1 −m)+ Ut , (Ut ) iid N (0,),

where A is a matrix whose eigenvalues have a modulus strictly less than 1. The process (�Pt ) isthen stationary with expectation m. It can be verified (Exercise 12.13) that

VaRt,h(α) = a′µt,h +√a′ha�

−1(1− α), (12.58)

where, letting Ai = (I − Ai)(I − A)−1,

µt,h = −mh− AAh(�Pt −m), h =h∑

j=1

Ah−j+1A′h−j+1.

If A = 0, (12.58) reduces to (12.56). Apart from this case, the term multiplying �−1(1− α) is notproportional to

√h.

Example 12.9 (ARCH(1) price variations) For simplicity let d = 1, a = 1 and

�Pt =√ω + α1�P

2t−1Ut , ω> 0, α1 ≥ 0, (Ut ) iid N (0, 1). (12.59)

The conditional law of Lt,t+1 is N (0, ω + α1�P2t ). Therefore,

VaRt,1(α) =√ω + α1�P

2t �

−1(1− α).

Here VaR computation at horizons larger than 1 is problematic. Indeed, the conditional distributionof Lt,t+h is no longer Gaussian when h> 1 (Exercise 12.14).

It is often more convenient to work with the log-returns rt = �1 logpt , assumed to be station-ary, than with the price variations. Letting qt (h, α) be the α-quantile of the conditional distributionof the future returns rt+1 + · · · + rt+h, we obtain (Exercise 12.15)

VaRt,h(α) ={1− eqt (h,α)

}pt . (12.60)

Page 347: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 331

Lack of Subadditivity of VaR

VaR is often criticized for not satisfying, for any distribution of the price variations, the‘subadditivity’ property. Subadditivity means that the VaR for two portfolios after they have beenmerged should be no greater than the sum of their VaRs before they were merged. However, thisproperty does not hold: if L1 and L2 are two loss variables, we do not necessarily have, in obviousnotation,

VaRL1+L2t,h (α) ≤ VaRL1

t,h(α)+ VaRL2t,h(α), ∀α, t, h. (12.61)

Example 12.10 (Pareto distribution) Let L1 and L2 be two independent variables, Pareto dis-tributed, with density f (x) = (2+ x)−2 1lx >−1. The cdf of this distribution is F(x) = (1− (2+x)−1) 1lx >−1, whence the VaR at level α is Var(α) = α−1 − 2. It can be verified, for instance usingMathematica, that

P [L1 + L2 ≤ x] = 1− 2

4+ x− 2 log(3+ x)

(4+ x)2, x >−2.

Thus

P [L1 + L2 ≤ 2VaR(α)] = 1− α − α2

2log

(2− α

α

)< 1− α.

It follows that VaRL1+L2(α)>VaRL1 (α)+ VaRL2 (α) = 2VaRL1 (α),∀α ∈]0, 1[. If, for instance,α = 0.01, we find VaRL1 (0.01) = 98 and, numerically, VaRL1+L2(0.01) ≈ 203.2.

This lack of subadditivity can be interpreted as a nonconvexity with respect to the composition ofthe portfolio. It means that the risk of a portfolio, when measured by the VaR, may be larger thanthe sum of the risks of each of its components (even when these components are independent,except in the Gaussian case). Risk management with VaR thus does not encourage diversification.

12.3.2 Other Risk MeasuresEven if VaR is the most widely used risk measure, the choice of an adequate risk measure is anopen issue. As already seen, the convexity property, with respect to the portfolio composition, is notsatisfied for VaR with some distributions of the loss variable. In what follows, we present severalalternatives to VaR, together with a conceptualization of the ‘expected’ properties of risks measures.

Volatility and Moments

In the Markowitz (1952) portfolio theory, the variance is used as a risk measure. It might then seemnatural, in a dynamic framework, to use the volatility as a risk measure. However, volatility doesnot take into account the signs of the differences from the conditional mean. More importantly, thismeasure does not satisfy some ‘coherency’ properties, as will be seen later (translation invariance,subadditivity).

Expected Shortfall

The expected shortfall (ES), or anticipated loss, is the standard risk measure used in insurancesince Solvency II. This risk measure is closely related to VaR, but avoids certain of its conceptualdifficulties (in particular, subadditivity). It is more sensitive then VaR to the shape of the conditionalloss distribution in the tail of the distribution. In contrast to VaR, it is informative about the expectedloss when a big loss occurs.

Page 348: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

332 GARCH MODELS

Let Lt,t+h be such that EL+t,t+h <∞. In this section, the conditional distribution of Lt,t+h isassumed to have a continuous and strictly increasing cdf. The ES at level α, also referred to asTailvar, is defined as the conditional expectation of the loss given that the loss exceeds the VaR:

ESt,h(α) := Et [Lt,t+h | Lt,t+h >VaRt,h(α)]. (12.62)We have

Et [Lt,t+h 1lLt,t+h >VaRt,h(α)] = Et [Lt,t+h | Lt,t+h >VaRt,h(α)]Pt [Lt,t+h >VaRt,h(α)].

Now Pt [Lt,t+h >VaRt,h(α)] = 1− Pt [Lt,t+h ≤ VaRt,h(α)] = 1− (1− α) = α, where the last butone equality follows from the continuity of the cdf at VaRt,h(α). Thus

ESt,h(α) = 1

αEt [Lt,t+h 1lLt,t+h >VaRt,h(α)]. (12.63)

The following characterization also holds (Exercise 12.16):

ESt,h(α) = 1

α

∫ α

0VaRt,h(u)du. (12.64)

ES thus can be interpreted, for a given level α, as the mean of the VaR over all levels u ≤ α.

Obviously, ESt,h(α) ≥ VaRt,h(α).

Note that the integral representation makes ESt,h(α) a continuous function of α, whatever thedistribution (continuous or not) of the loss variable. VaR does not satisfy this property (for lossvariables which have a zero mass over certain intervals).

Example 12.11 (The Gaussian case) If the conditional loss distribution is N (mt,t+h, σ 2t,t+h)

then, by (12.55), VaRt,h(α) = mt,t+h + σt,t+h�−1(1− α), where � is the N(0, 1) cdf. Using(12.62) and introducing L∗, a variable of law N (0, 1), we have

ESt,h(α) = mt,t+h + σt,t+hE[L∗ | L∗ ≥ �−1(1− α)]

= mt,t+h + σt,t+h1

αE[L∗ 1lL∗≥�−1(1−α)]

= mt,t+h + σt,t+h1

αφ{�−1(1− α)},

where φ is the density of the standard Gaussian. For instance, if α = 0.05, the conditional standarddeviation is multiplied by 1.65 in the VaR formula, and by 2.06 in the ES formula.

More generally, we have under assumption (12.54), in view of (12.64) and (12.55),

ESt,h(α) = mt,t+h + σt,t+h1

α

∫ α

0F←h (1− u)du. (12.65)

Distortion Risk Measures

Continue to assume that the cdf F of the loss distribution is continuous and strictly increasing.For notational simplicity, we omit the indices t and h. From (12.64), the ES is written as

ES(α) =∫ 1

0F−1(1− u) 1l[0,α](u)

1

αdu,

Page 349: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 333

where the term 1l[0,α]1α

can be interpreted as the density of the uniform distribution over [0, α].More generally, a distortion risk measure (DRM) is defined as a number

r(F ;G) =∫ 1

0F−1(1− u)dG(u),

where G is a cdf on [0, 1], called distortion function , and F is the loss distribution. The introductionof a probability distribution on the confidence levels is often interpreted in terms of optimism orpessimism. If G admits a density g which is increasing on [0, 1], that is, if G is convex, the weightof the quantile F−1(1− u) increases with u: large risks receive small weights with this choice ofG. Conversely, if G decreases, those large risks receive the bigger weights.

VaR at level α is a DRM, obtained by taking for G the Dirac mass at α. As we have seen, theES corresponds to the constant density g on [0, α]: it is simply an average over all levels below α.

A family of DRMs is obtained by parameterizing the distortion measure as

rp(F ;G) =∫ 1

0F−1(1− u)dGp(u),

where the parameter p reflects the confidence level, that is, the degree of optimism in the face ofrisk.

Example 12.12 (Exponential DRM) Let

Gp(u) = 1− e−pu

1− e−p,

where p ∈]0,+∞[. We have

rp(F ;G) =∫ 1

0F−1(1− u)

pe−pu

1− e−pdu.

The density function g is decreasing whatever the value of p, which corresponds to an excessiveweighting of the larger risks.

Coherent Risk Measures

In response to criticisms of VaR, several notions of coherent risk measures have been introduced.One of the proposed definitions is the following.

Definition 12.2 Let L denote a set of real random loss variables defined on a measurable space(�,A). Suppose that L contains all the variables that are almost surely constant and is closedunder addition and multiplication by scalars. An application ρ : L �→ R is called a coherent riskmeasure if it has the following properties:

1. Monotonicity: ∀L1, L2 ∈ L, L1 ≤ L2 ⇒ ρ(L1) ≤ ρ(L2).

2. Subadditivity: ∀L1, L2 ∈ L, L1 + L2 ∈ L⇒ ρ(L1 + L2) ≤ ρ(L1)+ ρ(L2).

3. Positive homogeneity: ∀L ∈ L, ∀λ ≥ 0, ρ(λL) = λρ(L).

4. Translation invariance: ∀L ∈ L, ∀c ∈ R, ρ(L+ c) = ρ(L)+ c.

This definition has the following immediate consequences:

1. ρ(0) = 0, using the homogeneity property with L = 0. More generally, ρ(c) = c for allconstants c (if a loss of amount c is certain, a cash amount c should be added to theportfolio).

Page 350: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

334 GARCH MODELS

2. If L ≥ 0, then ρ(L) ≥ 0. If a loss is certain, an amount of capital must be added to theposition.

3. ρ(L− ρ(L)) = 0, that is, the deterministic amount ρ(L) cancels the risk of L.

These requirements are not satisfied for most risk measures used in finance. The variance, ormore generally any risk measure based on the centered moments of the loss distribution, doesnot satisfy the monotonicity property, for instance. The expectation can be seen as a coherent, butuninteresting, risk measure. VaR satisfies all conditions except subadditivity: we have seen that thisproperty holds for (dependent or independent) Gaussian variables, but not for general variables.ES is a coherent risk measure in the sense of Definition 12.2 (Exercise 12.17). It can be shown(see Wang and Dhaene, 1998) that DRMs with G concave satisfy the subadditivity requirement.

12.3.3 Estimation MethodsUnconditional VaR

The simplest estimation method is based on the K last returns at horizon h, that is, rt+h−i (h) =log(pt+h−i/pt−i ), for i = h . . . , h+K − 1. These K returns are viewed as scenarios for futurereturns. The nonparametric historical VaR is simply obtained by replacing qt (h, α) in (12.60)by the empirical α-quantile of the last K returns. Typical values are K = 250 and α = 1% ,which means that the third worst return is used as the empirical quantile. A parametric versionis obtained by fitting a particular distribution to the returns, for example a Gaussian N (µ, σ 2)

which amounts to replacing qt (h, α) by µ+ σ�−1(α), where µ and σ are the estimated meanand standard deviation. Apart from the (somewhat unrealistic) case where the returns are iid, thesemethods have little theoretical justification.

RiskMetrics Model

A popular estimation method for the conditional VaR relies on the RiskMetrics model. This modelis defined by the equations{

rt := log(pt/pt−1) = εt = σtηt , (ηt ) iid N (0, 1),σ 2t = λσ 2

t−1 + (1− λ)ε2t−1,

(12.66)

where λ ∈]0, 1[ is a smoothing parameter, for which, according to RiskMetrics (Longerstaey, 1996),a reasonable choice is λ = 0.94 for daily series. Thus, σ 2

t is simply the prediction of ε2t obtained

by simple exponential smoothing. This model can also be viewed as an IGARCH(1, 1) withoutintercept. It is worth noting that no nondegenerate solution (rt )t∈Z to (12.66) exists (Exercise12.18). Thus, (12.66) is not a realistic data generating process for any usual financial series. Thismodel can, however, be used as a simple tool for VaR computation. From (12.60), we get

VaRt,1(α) ={

1− eσt+1�−1(α)

}pt " ptσt+1�

−1(1− α).

Let �t denote the information generated by εt , εt−1, . . . , ε1. Choosing an arbitrary initial value toσ 2

1 , we obtain σ 2t+1 ∈ �t and

E(σ 2t+i | �t) = E(λσ 2

t+i−1 + (1− λ)σ 2t+i−1 | �t) = E(σ 2

t+i−1 | �t) = σ 2t+1

for i ≥ 2. It follows that Var(rt+1 + · · · + rt+h | �t) = hσ 2t+1. Note however that the conditional

distribution of rt+1 + · · · + rt+h is not exactly N (0, hσ 2t+1) (Exercise 12.19). Many practitioners,

however, systematically use the erroneous formula

VaRt,h(α) =√h VaRt (1, α). (12.67)

Page 351: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 335

GARCH-Based Estimation

Of course, one can use more sophisticated GARCH-type models, rather than the degenerate versionof RiskMetrics. To estimate VaRt (1, α) it suffices to use (12.60) and to estimate qt (1, α) byσt+1F

−1(α), where σ 2t is the conditional variance estimated by a GARCH-type model (for instance,

an EGARCH or TGARCH to account for the leverage effect; see Chapter 10), and F is an estimateof the distribution of the normalized residuals. It is, however, important to note that, even for asimple Gaussian GARCH(1, 1), there is no explicit available formula for computing qt (h, α) whenh> 1. Apart from the case h = 1, simulations are required to evaluate this quantile (but, as canbe seen from Exercise 12.19, this should also be the case with the RiskMetrics method). Thefollowing procedure may then be suggested:

(a) Fit a model, for instance a GARCH(1, 1), on the observed returns rt = εt , t = 1, . . . , n,and deduce the estimated volatility σ 2

t for t = 1, . . . , n+ 1.

(b) Simulate a large number N of scenarios for εn+1, . . . , εn+h by iterating, independently fori = 1, . . . , N , the following three steps:

(b1) simulate the values η(i)n+1, . . . , η(i)n+h iid with the distribution F ;

(b2) set σ (i)n+1 = σn+1 and ε(i)n+1 = σ

(i)n+1η

(i)n+1;

(b3) for k = 2, . . . , h, set(σ(i)n+k)2 = ω + α

(ε(i)n+k−1

)2 + β(σ(i)n+k−1

)2and ε

(i)n+k =

σ(i)n+kη

(i)n+k .

(c) Determine the empirical quantile of simulations ε(i)t+h, i = 1, . . . , N .

The distribution F can be obtained parametrically or nonparametrically. A simple nonparametricmethod involves taking for F the empirical distribution of the standardized residuals rt/σt , whichamounts to taking, in step (b1), a bootstrap sample of the standardized residuals.

Assessment of the Estimated VaR (Backtesting)

The Basel accords allow financial institutions to develop their own internal procedures to evaluatetheir techniques for risk measurement. The term ‘backtesting’ refers to procedures comparing, ona test (out-of-sample) period, the observed violations of the VaR (or any other risk measure), thelatter being computed from a model estimated on an earlier period (in-sample).

To fix ideas, define the variables corresponding to the violations of VaR (‘hit variables’)

It+1(α) = 1l{Lt,t+1 >VaRt,1(α)} .

Ideally, we should have

1

n

n∑t=1

It+1(α) " α and1

n

n∑t=1

Vart,1(α) minimal,

that is, a correct proportion of effective losses which violate the estimated VaRs, with a minimalaverage cost.

Numerical Illustration

Consider a portfolio constituted solely by the CAC 40 index, over the period from March 1, 1990to April 23, 2007. We use the first 2202 daily returns, corresponding to the period from March 2,1990 to December 30, 1998, to estimate the volatility using different methods. To fix ideas, suppose

Page 352: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

336 GARCH MODELS

that on December 30, 1998 the value of the portfolio was, in French Francs, the equivalent of ¤1million. For the second period, from January 4, 1999 to April 23, 2007 (2120 values), we estimatedVaR at horizon h = 1 and level α = 1% using four methods.

The first method (historical) is based on the empirical quantiles of the last 250 returns. Thesecond method is RiskMetrics. The initial value for σ 2

1 was chosen equal to the average of thesquared last 250 returns of the period from March 2, 1990 to December 30, 1998, and we tookλ = 0.94. The third method (GARCH-N) relies on a GARCH(1, 1) model with Gaussian N (0, 1)innovations. With this method we set F−1(0.01) = −2.32635, the 1% quantile of the N (0, 1)distribution. The last method (GARCH-NP) estimates volatility using a GARCH(1, 1) model, andapproximates F−1(0.01) by the empirical 1% quantile of the standardized residuals. For the lasttwo methods, we estimated a GARCH(1, 1) on the first period, and kept this GARCH model for allVaR estimations of the second period. The estimated VaR and the effective losses were comparedfor the 2120 data of the second period.

Table 12.1 and Figure 12.2 do not allow us to draw definitive conclusions, but the historicalmethod appears to be outperformed by the NP-GARCH method. On this example, the only method

Table 12.1 Comparison of the four VaR estimation methods for the CAC 40. On the 2120values, the VaR at the 1% level should only be violated 2120× 1% = 21.2 times on average.

Historic RiskMetrics GARCH-N GARCH-NP

Average estimated VaR (¤) 38 323 32 235 31 950 35 059No. of losses > VaR 29 37 37 21

−60

−40

−20

20

40

60

80

−60

−40

−20

20

40

60

80

−60

−40

−20

20

40

60

80

−60

−40

−20

20

40

60

80

Figure 12.2 Effective losses of the CAC 40 (solid lines) and estimated VaRs (dotted lines) inthousands of euros for the historical method (top left), RiskMetrics (top right), GARCH-N (bottomleft) and GARCH-NP (bottom right).

Page 353: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 337

which adequately controls the level 1% is the NP-GARCH, which is not surprising since theempirical distribution of the standardized residuals is very far from Gaussian.

12.4 Bibliographical Notes

A detailed presentation of the financial concepts introduced in this chapter is provided in the booksby Gourieroux and Jasiak (2001) and Franke, Hardle and Hafner (2004). A classical reference onthe stochastic calculus is the book by Karatzas and Shreve (1988).

The relation between continuous-time processes and GARCH processes was established byNelson (1990b) (see also Nelson, 1992; Nelson and Foster, 1994, 1995). The results obtained byNelson rely on concepts presented in the monograph by Stroock and Varadhan (1979). A synthesisof these results is presented in Elie (1994). An application of these techniques to the TARCHmodel with contemporaneous asymmetry is developed in El Babsiri and Zakoıan (1990).

When applied to high-frequency (intraday) data, diffusion processes obtained as GARCH limitswhen the time unit tends to zero are often found inadequate, in particular because they do not allowfor daily periodicity. There is a vast literature on the so-called realized volatility , which is a dailymeasure of daily return variability. See Barndorff-Nielsen and Shephard (2002) and Andersen etal. (2003) for econometric approaches to realized volatility. In the latter paper, it is argued that‘standard volatility models used for forecasting at the daily level cannot readily accommodatethe information in intraday data, and models specified directly for the intraday data generally failto capture the longer interdaily volatility movements sufficiently well’. Another point of view isdefended in the recent thesis by Visser (2009) in which it is shown that intraday price movementscan be incorporated into daily GARCH models.

Concerning the pricing of derivatives, we have purposely limited our presentation to the ele-mentary definitions. Specialized monographs on this topic are those of Dana and Jeanblanc-Picque(1994) and Duffie (1994). Many continuous-time models have been proposed to extend the Blackand Scholes (1973) formula to the case of a nonconstant volatility. The Hull and White (1987)approach introduces a stochastic differential equation for the volatility but is not compatible withthe assumption of a complete market. To overcome this difficulty, Hobson and Rogers (1998)developed a stochastic volatility model in which no additional Brownian motion is introduced. Adiscrete-time version of this model was proposed and studied by Jeantheau (2004).

The characterization of the risk-neutral measure in the GARCH case is due to Duan (1995).Numerical methods for computing option prices were developed by Engle and Mustafa (1992) andHeston and Nandi (2000), among many others. Problems of option hedging with pricing modelsbased on GARCH or stochastic volatility are discussed in Garcia, Ghysels and Renault (1998). Theempirical performance of pricing models in the GARCH framework is studied by Hardle and Hafner(2000), Christoffersen and Jacobs (2004) and the references therein. Valuation of American optionsin the GARCH framework is studied in Duan and Simonato (2001) and Stentoft (2005). The useof the realized volatility, based on high-frequency data is considered in Stentoft (2008). Statisticalproperties of the realized volatility in stochastic volatility models are studied by Barndorff-Nielsenand Shephard (2002).

Introduced by Engle, Lilien and Robins (1987), ARCH-M models are characterized by alinear relationship between the conditional mean and variance of the returns. These models wereused to test the validity of the intertemporal capital asset pricing model of Merton (1973) whichpostulates such a relationship (see, for instance, Lanne and Saikkonen, 2006). To our knowledge,the asymptotic properties of the QMLE have not been established for ARCH-M models.

The concept of the stochastic discount factor was developed by Hansen and Richard (1987)and, more recently, by Cochrane (2001). Our presentation follows that of Gourieroux and Tiomo(2007). This method is used in Bertholon, Monfort and Pegoraro (2008).

The concept of coherent risk measures (Definition 12.2) was introduced by Artzner et al.(1999), initially on a finite probability space, and extended by Delbaen (2002). In the latter article

Page 354: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

338 GARCH MODELS

it is shown that, for the existence of coherent risk measures, the set L cannot be too large, forinstance the set of all absolutely continuous random variables. Alternative axioms were introducedby Wang, Young and Panjer (1997), initially for risk analysis in insurance. Dynamic VaR modelswere proposed by Koenker and Xiao (2006) (quantile autoregressive models), Engle and Manganelli(2004) (conditional autoregressive VaR), Gourieroux and Jasiak (2008) (dynamic additive quantile).The issue of assessing risk measures was considered by Christoffersen (1998), Christoffersen andPelletier (2004), Engle and Manganelli (2004) and Hurlin and Tokpavi (2006), among others.The article by Escanciano and Olmo (2010) considers the impact of parameter estimation in riskmeasure assessment. Evaluation of VaR at horizons longer than 1, under GARCH dynamics, isdiscussed by Ardia (2008).

12.5 Exercises

12.1 (Linear SDE)Consider the linear SDE (12.6). Letting X0

t denote the solution obtained for ω = 0, what isthe equation satisfied by Yt = Xt/X

0t ?

Hint : the following result, which is a consequence of the multidimensional Ito formula,can be used. If X = (X1

t , X2t ) is a two-dimensional process such that, for a real Brownian

motion (Wt ), {dX1

t = µ1t dt + σ 1

t dWt

dX2t = µ2

t dt + σ 2t dWt

under standard assumptions, then

d(X1t X

2t ) = X1

t dX2t +X2

t dX1t + σ 1

t σ2t dt.

Deduce the solution of (12.6) and verify that if ω ≥ 0, x0 ≥ 0 and (ω, x0) = (0, 0), thenthis solution will remain strictly positive.

12.2 (Convergence of the Euler discretization)Show that the Euler discretization (12.15), with µ and σ continuous, converges in distri-bution to the solution of the SDE (12.1), assuming that this equation admits a unique (indistribution) solution.

12.3 (Another limiting process for the GARCH(1, 1) (Corradi, 2000))Instead of the rates of convergence (12.30) for the parameters of a GARCH(1, 1), consider

limτ→0

τ−1ωτ = ω, limτ→0

τ−sατ = 0, ∀s < 1, limτ→0

τ−1(ατ + βτ − 1) = −δ.

Give an example of the sequence (ωτ , ατ , βτ ) compatible with these conditions. Determinethe limiting process of (X(τ)

t , (σ(τ)t )2) when τ → 0. Show that, in this model, the volatility

σ 2t has a nonstochastic limit when t →∞.

12.4 (Put–call parity)Using the martingale property for the actualized price under the risk-neutral probability,deduce the European put option price from the European call option price.

12.5 (Delta of a European call)Compute the derivative with respect to St of the European call option price and check thatit is positive.

Page 355: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

FINANCIAL APPLICATIONS 339

12.6 (Volatility of an option price)Show that the European call option price Ct = C(S, t) is solution of an SDE of the form

dCt = µtCtdt + σtCtdWt

with σt > σ.

12.7 (Estimation of the drift and volatility)Compute the maximum likelihood estimators of µ and σ 2 based on observations S1, . . . , Snof the geometric Brownian motion.

12.8 (Vega of a European call)A measure of the sensitivity of an option to the volatility of the underlying asset is theso-called vega coefficient defined by ∂Ct/∂σ . Compute this coefficient for a European calland verify that it is positive. Is this intuitive?

12.9 (Martingale property under the risk-neutral probability)Verify that under the measure π defined in (12.52), the actualized price e−rt St is a martingale.

12.10 (Risk-neutral probability for a nonlinear GARCH model)Duan (1995) considered the model log (St /St−1) = r + λσt − σ 2

t

2 + εt ,

εt = σtηt ,

σ 2t = ω + {α(ηt−1 − γ )2 + β}σ 2

t−1,

where ω> 0, α, β ≥ 0 and (ηt )iid∼ N (0, 1). Establish the strict and second-order stationar-

ity conditions for the process (εt ). Determine the risk-neutral probability using stochasticdiscount factors, chosen to be affine exponential with time-dependent coefficients.

12.11 (A nonaffine exponential SDF)Consider an SDF of the form

Mt,t+1 = exp(at + btηt+1 + ctη2t+1).

Show that, by an appropriate choice of the coefficients at , bt and ct , with ct = 0, a risk-neutral probability can be obtained for model (12.51). Derive the risk-neutral version of themodel and verify that the volatility differs from that of the initial model.

12.12 (An erroneous computation of the VaR at horizon h.)The aim of this exercise is to show that (12.57) may be wrong if the price variations areiid but non-Gaussian. Suppose that (a′�Pt) is iid, with a double exponential density withparameter λ, given by f (x) = 0.5λ exp{−λ|x|}. Calculate VaRt,1(α). What is the density ofLt,t+2? Deduce the equation for VaR at horizon 2. Show, for instance for λ = 0.1, that VaRis overevaluated if (12.57) is applied with α = 0.01, but is underevaluated with α = 0.05.

12.13 (VaR for AR(1) prices variations.)Check that formula (12.58) is satisfied.

12.14 (VaR for ARCH(1) prices variations.)Suppose that the price variations follow an ARCH(1) model (12.59). Show that the distri-bution of �Pt+2 conditional on the information at time t is not Gaussian if α1 > 0. Deducethat VaR at horizon 2 is not easily computable.

Page 356: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

340 GARCH MODELS

12.15 (VaR and conditional quantile)Derive formula (12.60), giving the relationship between VaR and the returns conditionalquantile.

12.16 (Integral formula for the ES)Using the fact that Lt,t+h and F−1(U) have the same distribution, where U denotes avariable uniformly distributed on [0, 1] and F the cdf of Lt,t+h, derive formula (12.64).

12.17 (Coherence of the ES)Prove that the ES is a coherent risk measure.Hint for proving subadditivity : For Li such that EL+i <∞, i = 1, 2, 3, denote the value atrisk at level α by VaRi (α) and the expected shortfall by ESi (α) = α−1E[Li 1lLi≥VaRi (α)].For L3 = L1 + L2, compute α{ES1(α)+ ES2(α)− ES3(α)} using expectations and observethat

(L1 − VaR1(α))(1lL1≥VaR1(α)− 1lL3≥VaR3(α)) ≥ 0.

12.18 (RiskMetrics is a prediction method, not really a model)For any initial value σ 2

0 let (εt )t≥1 be a sequence of random variables satisfying the Risk-Metrics model (12.66) for any t ≥ 1. Show that εt → 0 almost surely as t →∞.

12.19 (At horizon h> 1 the conditional distribution of the future returns is not Gaussian withRiskMetrics)Prove that in the RiskMetrics model, the conditional distribution of the returns at horizon2, rt+1 + rt+2, is not Gaussian. Conclude that formula (12.67) is incorrect.

Page 357: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Part IV

Appendices

Page 358: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 359: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Appendix A

Ergodicity, Martingales, Mixing

A.1 Ergodicity

A stationary sequence is said to be ergodic if it satisfies the strong law of large numbers.

Definition A.1 (Ergodic stationary processes) A strictly stationary process (Zt )t∈Z, real-valued, is said to be ergodic if and only if, for any Borel set B and any integer k,

n−1n∑t=1

1lB(Zt , Zt+1, . . . , Zt+k)→ P {(Z1, . . . , Z1+k) ∈ B}

with probability 1.1

General transformations of ergodic sequences remain ergodic. The proof of the following resultcan be found, for instance, in Billingsley (1995, Theorem 36.4).

Theorem A.1 If (Zt )t∈Z is an ergodic strictly stationary sequence and if (Yt )t∈Z is defined by

Yt = f (. . . , Zt−1, Zt , Zt+1, . . . ),

where f is a measurable function from R∞ to R, then (Yt )t∈Z is also an ergodic strictly stationarysequence.

In particular, if (Xt)t∈Z is the nonanticipative stationary solution of the AR(1) equation

Xt = aXt−1 + ηt , |a| < 1, ηt iid (0, σ 2), (A.1)

then the theorem shows that (Xt)t∈Z, (Xt−1ηt )t∈Z and (X2t−1)t∈Z are also ergodic stationary

sequences.

1 The ergodicity concept is much more general, and can be extended to nonstationary sequences (see, forinstance, Billingsley, 1995).

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 360: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

344 GARCH MODELS

Theorem A.2 (The ergodic theorem for stationary sequences) If (Zt )t∈Z is strictly stationaryand ergodic, if f is measurable and if

E|f (. . . , Zt−1, Zt , Zt+1, . . . )| <∞,

then

n−1n∑t=1

f (. . . , Zt−1, Zt , Zt+1, . . . )→ Ef (. . . , Zt−1, Zt , Zt+1, . . . ) a.s.

As an example, consider the least-squares estimator an of the parameter a in (A.1). By definition

an = arg minaQn(a), Qn(a) =

n∑t=2

(Xt − aXt−1)2.

From the first-order condition, we obtain

an = n−1∑nt=2 XtXt−1

n−1∑n

t=2 X2t−1

.

The ergodic theorem shows that the numerator tends almost surely to γ (1) = Cov(Xt , Xt−1) =aγ (0) and that the denominator tends to γ (0). It follows that an → a almost surely as n→∞.Note that this result still holds true when the assumption that ηt is a strong white noise is replacedby the assumption that ηt is a semi-strong white noise, or even by the assumption that ηt is anergodic and stationary weak white noise.

A.2 Martingale Increments

In a purely random fair game (for instance, A and B play ‘heads or tails’, A gives one dollar to B

if the coin falls tails, and B gives one dollar to A if the coin falls heads), the winnings of a playerconstitute a martingale.

Definition A.2 (Martingale) Let (Yt )t∈N be a sequence of real random variables and (Ft )t∈N

be a sequence of σ -fields. The sequence (Yt ,Ft )t∈N is said to be a martingale if and only if

1. Ft ⊂ Ft+1;

2. Yt is Ft -measurable;

3. E|Yt | <∞;

4. E(Yt+1|Ft ) = Yt .

When (Yt )t∈N is said to be a martingale, it is implicitly assumed that Ft = σ(Yu, u ≤ t), that is,the σ -field generated by the past and present values.

Definition A.3 (Martingale difference) Let (ηt )t∈N be a sequence of real random variables and(Ft )t∈N be a sequence of σ -fields. The sequence (ηt ,Ft )t∈N is said to be a martingale difference(or a sequence of martingale increments) if and only if

1. Ft ⊂ Ft+1;

2. ηt is Ft -measurable;

3. E|ηt | <∞;

4. E(ηt+1|Ft ) = 0.

Page 361: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ERGODICITY, MARTINGALES, MIXING 345

Remark A.1 If (Yt ,Ft )t∈N is a martingale and η0 = Y0, ηt = Yt − Yt−1, then (ηt ,Ft )t∈N is amartingale difference: E(ηt+1|Ft ) = E(Yt+1|Ft )−E(Yt |Ft ) = 0.

Remark A.2 If (ηt ,Ft )t∈N is a martingale difference and Yt = η0 + η1 + · · · + ηt , then(Yt ,Ft )t∈N is a martingale: E(Yt+1|Ft ) = E(Yt + ηt+1|Ft ) = Yt .

Remark A.3 In example (A.1),{k∑i=0

aiηt−i , σ (ηu, t − k ≤ u ≤ t)

}k∈N

is a martingale, and {ηt , σ (ηu, u ≤ t)}t∈N, {ηtXt−1, σ (ηu, u ≤ t)}t∈N are martingale differences.

There exists a central limit theorem (CLT) for triangular sequences of martingale differences (seeBillingsley, 1995, p. 476).

Theorem A.3 (Lindeberg’s CLT) Assume that, for each n> 0, (ηnk,Fnk)k∈N is a sequence ofsquare integrable martingale increments. Let σ 2

nk = E(η2nk|Fn(k−1)). If

n∑k=1

σ 2nk → σ 2

0 in probability as n→∞, (A.2)

where σ0 is a strictly positive constant, and

n∑k=1

Eη2nk 1l{|ηnk |≥ε} → 0 as n→∞, (A.3)

for any positive real ε, then∑n

k=1 ηnkL→ N(0, σ 2

0 )

Remark A.4 In numerous applications, ηnk and Fnk are only defined for 1 ≤ k ≤ n and can bedisplayed as a triangular array

η11

η21 η22

η31 η32 η33...

ηn1 ηn2 · · · ηnn...

One can define ηnk and Fnk for all k ≥ 0, with ηn0 = 0, Fn0 = {∅,�} and ηnk = 0, Fnk = Fnn

for all k >n. In the theorem each row of the triangular array is assumed to be a martingaledifference.

Remark A.5 The previous theorem encompasses the usual CLT. Let Z1, . . . , Zn be an iidsequence with a finite variance. It suffices to take

ηnk = Zk − EZk√n

and Fnk = σ(Z1, . . . , Zk).

Page 362: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

346 GARCH MODELS

It is clear that (ηnk,Fnk)k∈N is a square integrable martingale difference. We have σ 2nk = Eη2

nk =n−1Var(Z0). Consequently, the normalization condition (A.2) is satisfied. Moreover,

n∑k=1

Eη2nk 1l{|ηnk |≥ε} =

n∑k=1

n−1∫{|Zk−EZk |≥√nε}

|Zk − EZk|2dP

=∫{|Z1−EZ1|≥

√nε}|Z1 − EZ1|2dP → 0

because{|Z1 − EZ1| ≥ √nε

} ↓ ∅ and∫�|Z1 − EZ1|2dP <∞. The Lindeberg condition (A.3)

is thus satisfied. The theorem entails the standard CLT:

n∑k=1

ηnk = 1√n

n∑k=1

(Zk − EZk).

Remark A.6 In example (A.1), take

ηnk = ηkXk−1√n

and Fnk = σ(ηu, u ≤ k).

The sequence (ηnk,Fnk)k∈N is a square integrable martingale difference. We have σ 2nk =

n−1σ 2X2k−1. The ergodic theorem entails (A.2) with σ 2

0 = σ 4/(1− a2). We obtain

n∑k=1

Eη2nk 1l{|ηnk |≥ε} =

n∑k=1

n−1∫{|ηkXk−1|≥

√nε}|ηkXk−1|2dP

=∫{|η1X0|≥

√nε}|η1X0|2dP → 0

because{|η1X0| ≥ √nε

} ↓ ∅ and∫�|η1X0|2dP <∞. This shows (A.3). The Lindeberg CLT

entails that

n−1/2n∑

k=1

ηkXk−1L→ N(0, σ 4/(1− a2)).

It follows that

n1/2 (an − a) = n−1/2∑nk=1 ηkXk−1

n−1∑n

k=1 X2k−1

L→ N(0, 1− a2), (A.4)

because n−1∑nk=1 X

2k−1 → σ 2/(1− a2).2

Remark A.7 The previous result can be used to obtain an asymptotic confidence interval or fortesting the coefficient a:

1.[an ± 1.96n−1/2

(1− a2

n

)1/2]

is a confidence interval for a at the asymptotic 95% confidencelevel.

2. The null assumption H0 : a = 0 is rejected at the asymptotic 5% level if |tn|> 1.96, wheretn =

√nan/

√1− a2

n is the t-statistic.

2 We have used Slutsky’s lemma: if YnL→ Y and Tn → T in probability, then TnYn

L→ YT .

Page 363: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ERGODICITY, MARTINGALES, MIXING 347

In the previous statistics, 1− a2n is often replaced by σ 2/γ (0), where σ 2 =∑n

t=1(Xt −anXt−1)

2/(n− 1) and γ (0) = n−1∑nt=1 X

2t−1. Asymptotically, there is no difference:

n− 1

n

σ 2

γ (0)=∑n

t=1(Xt − anXt−1)2∑n

t=1 X2t−1

=∑n

t=1 X2t + a2

n

∑nt=1 X

2t−1 − 2an

∑nt=1 XtXt−1∑n

t=1 X2t−1

=∑n

t=1 X2t∑n

t=1 X2t−1

− a2n.

However, it is preferable to use σ 2/γ (0), which is always positive, rather than 1− a2n because, in

finite samples, one can have a2n > 1.

The following corollary applies to GARCH processes, which are stationary and ergodic martingaledifferences.

Corollary A.1 (Billingsley, 1961) If (νt ,Ft )t is a stationary and ergodic sequence of squareintegrable martingale increments such that σ 2

ν = Var(νt ) = 0, then

n−1/2n∑t=1

νtL→ N (0, σ 2

ν

).

Proof. Let ηnk = νk/√n and Fnk = Fk . For all n, the sequence (ηnk,Fk)k is a square inte-

grable martingale difference. With the notation of Theorem A.3, we have σ 2nk = E(η2

nk|Fk−1), and(nσ 2

nk)k ={E(ν2

k |Fk−1)}k

is a stationary and ergodic sequence. We thus have almost surely

n∑k=1

σ 2nk =

1

n

n∑k=1

E(ν2k |Fk−1)→ E

{E(ν2

k |Fk−1)} = σ 2

ν > 0,

which shows the normalization condition (A.2). Moreover,

n∑k=1

Eη2nk 1l{|ηnk |≥ε} =

n∑k=1

n−1∫{|νk |≥√nε}

ν2kdP =

∫{|ν1|≥

√nε}

ν21 dP → 0,

using stationarity and Lebesgue’s theorem. This shows (A.3). The corollary is thus a consequenceof Theorem A.3. �

A.3 Mixing

Numerous probabilistic tools have been developed for measuring the dependence between variables.For a process, elementary measures of dependence are the autocovariances and autocorrelations.When there is no linear dependence between Xt and Xt+h, as is the case for a GARCH pro-cess, the autocorrelation is not the right tool, and more elaborate concepts are required. Mixingassumptions, introduced by Rosenblatt (1956), are used to convey different ideas of asymptoticindependence between the past and future of a process. We present here two of the most popularmixing coefficients.

Page 364: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

348 GARCH MODELS

A.3.1 α-Mixing and β-Mixing CoefficientsThe strong mixing (or α-mixing) coefficient between two σ -fields A and B3 is defined by

α(A,B) = supA∈A,B∈B

|P(A ∩ B)− P(A)P(B)| .

It is clear that:

(i) if A and B are independent then α(A,B) = 0;

(ii) 0 ≤ α(A,B) ≤ 1/4;4

(iii) α(A,A) = 1/4 provided that A contains an event of probability 1/2;

(iv) α(A,A)> 0 provided that A is nontrivial;5

(v) α(A′,B′) ≤ α(A,B) provided that A′ ⊂ A and B′ ⊂ B.

The strong mixing coefficients of a process X = (Xt ) are defined by

αX(h) = supt

α {σ (Xu, u ≤ t) , σ (Xu, u ≥ t + h)} .

If X is stationary, the term supt can be omitted. In this case, we have

αX(h) = supA,B

|P(A ∩ B)− P(A)P(B)|

= supf,g

|Cov(f (. . . , X−1, X0), g(Xh,Xh+1, . . .))| (A.5)

where the first supremum is taken on A ∈ σ(Xs, s ≤ 0) and B ∈ σ(Xs, s ≥ h) and the second istaken on the set of the measurable functions f and g such that |f | ≤ 1, |g| ≤ 1. X is said to bestrongly mixing, or α-mixing, if αX(h)→ 0 as h→∞. If αX(h) tends to zero at an exponentialrate, then X is said to be geometrically strongly mixing .

The β-mixing coefficients of a stationary process X are defined by

βX(k) = E supB∈σ(Xs ,s≥k)

|P(B | σ(Xs, s ≤ 0))− P(B)|

= 1

2sup

I∑i=1

J∑j=1

|P(Ai ∩ Bj )− P(Ai)P(Bj )|, (A.6)

where in the last equality, the sup is taken among all the pairs of partitions {A1, . . . , AI } and{B1, . . . , BJ } of � such that Ai ∈ σ(Xs, s ≤ 0) for all i and Bj ∈ σ(Xs, s ≥ k) for all j . Theprocess is said to be β-mixing if limk→∞ βX(k) = 0. We have

αX(k) ≤ βX(k),

3 Obviously we are working with a probability space (�,A0,P), and A and B are sub-σ -fields of A0.4 It suffices to note that

|P(A ∩ B)− P(A)P(B)| = ∣∣P(A ∩ B)− P(A ∩ B)P(B)− P(A ∩ Bc)P(B)∣∣

= ∣∣P(A ∩ B)P(Bc)− P(A ∩ Bc)P(B)∣∣

≤ P(Bc)P(B) ≤ 1/4.

For the first inequality we use ||a| − |b|| ≤ max{|a|, |b|}. Alternatively, one can note that P(A ∩ B)−P(A)P(B) = Cov(1lA, 1lB) and use the Cauchy–Schwarz inequality.

5 That is, A contains an event of probability different from 0 or 1.

Page 365: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ERGODICITY, MARTINGALES, MIXING 349

so that β-mixing implies α-mixing. If Y = (Yt ) is a process such that Yt = f (Xt , . . . , Xt−r )for a measurable function f and an integer r ≥ 0, then σ (Yt , t ≤ s) ⊂ σ (Xt , t ≤ s) andσ (Yt , t ≥ s) ⊂ σ (Xt−r , t ≥ s). In view of point (v) above, this entails that

αY (k) ≤ αX(k − r) and βY (k) ≤ βX(k − r) for all k ≥ r . (A.7)

Example A.1 It is clear that a q-dependent process such that

Xt ∈ σ(εt , εt−1, . . . , εt−q), where the εt are independent,

is strongly mixing, because

αX (h) ≤ αε (h− q) = 0, ∀h ≥ q.

Example A.2 Consider the process defined by

Xt = Y cos(λt),

where λ ∈ (0, π) and Y is a nontrivial random variable. Note that when cos(λt) = 0, which occursfor an infinite number of t , we have σ(Xt) = σ(Y ). We thus have, for any t and any h, σ(Xu,

u ≤ t) = σ(Xu, u ≥ t + h) = σ(Y ), and αX(h) = α {σ(Y ), σ (Y )}> 0 by (iv), which shows thatX is not strongly mixing.

Example A.3 Let (ut ) be an iid sequence uniformly distributed on {1, . . . , 9}. Let

Xt =∞∑i=0

10−i−1ut−i .

The sequence ut , ut−1, . . . constitutes the decimals of Xt ; we can write Xt = 0.utut−1 . . . . Theprocess X = (Xt ) is stationary and satisfies a strong AR(1) representation of the form

Xt = 1

10Xt−1 + 1

10ut = 1

10Xt−1 + 1

2+ εt

where εt = (ut /10)− (1/2) is a strong white noise. The process X is not α-mixing becauseσ(Xt ) ⊂ σ(Xt+h) for all h ≥ 0,6 and by (iv) and (v),

αX (h) ≥ α {σ(Xt ), σ (Xt)}> 0, ∀h ≥ 0.

A.3.2 Covariance InequalityLet p, q and r be three positive numbers such that p−1 + q−1 + r−1 = 1. Davydov (1968) showedthe covariance inequality

|Cov(X, Y )| ≤ K0‖X‖p‖Y‖q [α {σ(X), σ (Y )}]1/r , (A.8)

where ‖X‖pp = EXp and K0 is a universal constant. Davydov initially proposed K0 = 12. Rio(1993) obtained a sharper inequality, involving the quantile functions of X and Y . The latter

6 This would not be true if we added 0 to the set of the possible values of ut . For instance, Xt = 0.4999 . . . =0.5000 . . . would not tell us whether Xt−1 is equal to 1 = 0.999 . . . or to 0.

Page 366: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

350 GARCH MODELS

inequality also shows that one can take K0 = 4 in (A.8). Note that (A.8) entails that the autoco-variance function of an α-mixing stationary process (with enough moments) tends to zero.

Example A.4 Consider the process X = (Xt) defined by

Xt = Y cos(ωt)+ Z sin(ωt),

where ω ∈ (0, π), and Y and Z are iid N(0, 1). Then X is Gaussian, centered and stationary, withautocovariance function γX(h) = cos(ωh). Since γX(h) → 0 as |h| → ∞, X is not mixing.

From inequality (A.8), we obtain, for instance, the following results.

Corollary A.2 Let X = (Xt) be a centered process such that

supt

‖Xt‖2+ν <∞,

∞∑h=0

{αX(h)}ν/(2+ν) <∞, for some ν > 0.

We have

EX2n = O

(1

n

).

Proof. Let K = K0 supt ‖Xt‖22+ν . From (A.8), we obtain

|EXt1Xt2 | = |Cov(Xt1 , Xt2 )| ≤ K{αX(|t2 − t1|)}ν/(2+ν).

We thus have

EX2n ≤

2

n2

∑1≤t1≤t2≤n

|EXt1Xt2 | ≤2

nK

∞∑h=0

{αX (h)}ν/(2+ν) = O

(1

n

).

Corollary A.3 Let X = (Xt) be a centered process such that

supt

‖Xt‖4+2ν <∞,

∞∑ =0

{αX( )}ν/(2+ν) <∞, for some ν > 0.

We have

supt

suph,k≥0

+∞∑ =−∞

|Cov (XtXt+h,Xt+ Xt+k+ )| <∞.

Proof. Consider the case 0 ≤ h ≤ k. We have

+∞∑ =−∞

|Cov (XtXt+h,Xt+ Xt+k+ )| = d1 + d2 + d3 + d4,

Page 367: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

ERGODICITY, MARTINGALES, MIXING 351

where

d1 =+∞∑ =h|Cov (XtXt+h,Xt+ Xt+k+ )| ,

d2 =−k∑

=−∞|Cov (XtXt+h,Xt+ Xt+k+ )| ,

d3 =+h−1∑ =0

|Cov (XtXt+h,Xt+ Xt+k+ )| ,

d4 =−1∑

=−k+1

|Cov (XtXt+h,Xt+ Xt+k+ )| .

Inequality (A.8) shows that d1 and d2 are uniformly bounded in t , h and k by

K0 supt

‖Xt‖44+2ν

∞∑ =0

{αX( )}ν/(2+ν).

To handle d3, note that d3 ≤ d5 + d6, where

d5 =+h−1∑ =0

|Cov (XtXt+ , Xt+hXt+k+ )| ,

d6 =+h−1∑ =0

|EXtXt+hEXtXt+k − EXtXt+ EXt+hXt+k+ | .

With the notation K = K0 supt ‖Xt‖44+2ν , we have

d5 ≤ K0 supt

‖Xt‖44+2ν

+h−1∑ =0

{αX(h− )}ν/(2+ν) ≤ K

∞∑ =0

{αX( )}ν/(2+ν),

d6 ≤ supt

‖Xt‖22

(h|EXtXt+h| +

+h−1∑ =0

|EXtXt+ |)

≤ K

{sup ≥0

{αX( )}ν/(2+ν) +∞∑ =0

{αX( )}ν/(2+ν)}.

For the last inequality, we used

supt

‖Xt‖22|EXtXt+h| ≤ K0 sup

t

‖Xt‖22 sup

t

‖Xt‖22+ν{αX(h)}ν/(2+ν)

≤ K{αX(h)}ν/(2+ν).

Since the sequence of the mixing coefficients is decreasing, it is easy to see thatsup ≥0 {αX( )}ν/(2+ν) <∞ (cf. Exercise 3.9). The term d3 is thus uniformly bounded int , h and k. We treat d4 as d3 (see Exercise 3.10). �

Page 368: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

352 GARCH MODELS

Corollary A.4 Under the conditions of Corollary A.3, we have

EX4n = O

(1

n

).

Proof. Let t1 ≤ t2 ≤ t3 ≤ t4. From (A.8), we obtain

EXt1Xt2Xt3Xt4 = Cov(Xt1 , Xt2Xt3Xt4 )

≤ K0‖Xt1‖4+2ν‖Xt2Xt3Xt4‖(4+2ν)/3{αX(|t2 − t1|)}ν/(2+ν)

≤ K{αX(|t2 − t1|)}ν/(2+ν).

We thus have

EX4n ≤

4!

n4

∑1≤t1≤t2≤t3≤t4≤n

EXt1Xt2Xt3Xt4

≤ 4!

nK

∞∑h=0

{αX (h)}ν/(2+ν) = O

(1

n

).

In the last inequality, we use the fact that the number of indices 1 ≤ t1 ≤ t2 ≤ t3 ≤ t4 ≤ n suchthat t2 = t1 + h is less than n3. �

A.3.3 Central Limit TheoremHerrndorf (1984) showed the following central limit theorem.

Theorem A.4 (CLT for α-mixing processes) Let X = (Xt) be a centered process such that

supt

‖Xt‖2+ν <∞,

∞∑h=0

{αX(h)}ν/(2+ν) <∞, for some ν > 0.

If σ 2 = limn→∞ Var(n−1/2∑n

t=1 Xt

)exists and is not zero, then

n−1/2n∑t=1

XtL→ N(0, σ 2).

Page 369: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Appendix B

Autocorrelation and PartialAutocorrelation

B.1 Partial Autocorrelation

DefinitionThe (theoretical) partial autocorrelation at lag h> 0, rX(h), of a second-order stationary processX = (Xt ) with nondegenerate linear innovations,1 is the correlation between

Xt − EL(Xt |Xt−1, Xt−2, . . . , Xt−h+1)

andXt−h − EL(Xt−h|Xt−1, Xt−2, . . . , Xt−h+1),

where EL(Y |Y1, . . . , Yk) denotes the linear regression of a square integrable variable Y on variablesY1, . . . , Yk . Let

rX(h) = Corr (Xt ,Xt−h|Xt−1, Xt−2, . . . , Xt−h+1) . (B.1)

The number rX(h) can be interpreted as the residual correlation between Xt and Xt−h, after thelinear influence of the intermediate variables Xt−1, Xt−2, . . . , Xt−h+1 has been subtracted. Assumethat (Xt ) is centered, and consider the linear regression of Xt on Xt−1, . . . , Xt−h:

Xt = ah,1Xt−1 + · · · + ah,hXt−h + uh,t , uh,t⊥Xt−1, . . . , Xt−h. (B.2)

We haveEL(Xt |Xt−1, . . . , Xt−h) = ah,1Xt−1 + · · · + ah,hXt−h, (B.3)

EL(Xt−h−1|Xt−1, . . . , Xt−h) = ah,1Xt−h + · · · + ah,hXt−1, (B.4)

andrX(h) = ah,h. (B.5)

1 Thus the variance of εt := Xt − EL(Xt |Xt−1, . . . ) is not equal to zero.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 370: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

354 GARCH MODELS

Proof of (B.3) and (B.4). We obtain (B.3) from (B.2), using the linearity of EL(·|Xt−1, . . . ,

Xt−h) and ah,1Xt−1 + · · · + ah,hXt−h⊥uh,t . The vector of the coefficients of the theoretical linearregression of Xt−h−1 on Xt−1, . . . , Xt−h is given by

E Xt−1

...

Xt−h

( Xt−1 . . . Xt−h)

−1

EXt−h−1

Xt−1...

Xt−h

. (B.6)

Since

E

Xt−1...

Xt−h

( Xt−1 . . . Xt−h) = E

Xt−h...

Xt−1

( Xt−h . . . Xt−1)

and

EXt−h−1

Xt−1...

Xt−h

= EXt

Xt−h...

Xt−1

,

this is also the vector of the coefficients of the linear regression of Xt on Xt−h, . . . , Xt−1, whichgives (B.4). �

Proof of (B.5). From (B.2) we obtain

EL(Xt |Xt−1, . . . , Xt−h+1) = ah,1Xt−1 + · · · + ah,h−1Xt−h+1

+ ah,hE(Xt−h|Xt−1, . . . , Xt−h+1).

Thus

Xt − EL(Xt |Xt−1, . . . , Xt−h+1) = ah,h {Xt−h − EL(Xt−h|Xt−1, . . . , Xt−h+1)} + uh,t .

This equality being of the form Y = ah,hX + u with u⊥X, we obtain Cov(Y,X) = ah,hVar(X),which gives

ah,h = Cov {Xt − EL(Xt |Xt−1, . . . , Xt−h+1),Xt−h − EL(Xt−h|Xt−1, . . . , Xt−h+1)}Var {Xt−h − EL(Xt−h|Xt−1, . . . , Xt−h+1)} .

To conclude, it suffices to note that, using the parity of γX(·) and (B.4),

Var {Xt − EL(Xt |Xt−1, . . . , Xt−h+1)}= Var

{Xt − ah−1,1Xt−1 − · · · − ah−1,h−1Xt−h+1

}= Var

{Xt−h − ah−1,1Xt−h+1 − · · · − ah−1,h−1Xt−1

}= Var {Xt−h − EL(Xt−h|Xt−1, . . . , Xt−h+1)} .

Page 371: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

AUTOCORRELATION AND PARTIAL AUTOCORRELATION 355

Computation AlgorithmFrom the autocorrelations ρX(1), . . . , ρX(h), the partial autocorrelation rX(h) can be computedrapidly with the aid of Durbin’s algorithm:

a1,1 = ρX(1), (B.7)

ak,k = ρX(k)−∑k−1

i=1 ρX(k − i)ak−1,i

1−∑k−1i=1 ρX(i)ak−1,i

, (B.8)

ak,i = ak−1,i − ak,kak−1,k−i , i = 1, . . . , k − 1. (B.9)

Steps (B.8) and (B.9) are repeated for k = 2, . . . , h− 1, and then rX(h) = ah,h is obtained bystep (B.8); see Exercise 1.14.

Proof of (B.9). In view of (B.2),

EL(Xt |Xt−1, . . . , Xt−k+1) =k−1∑i=1

ak,iXt−i + ak,kEL(Xt−k|Xt−1, . . . , Xt−k+1).

Using (B.4), we thus have

k−1∑i=1

ak−1,iXt−i =k−1∑i=1

ak,iXt−i + ak,k

k−1∑i=1

ak−1,k−iXt−i ,

which gives (B.9) (the variables Xt−1, . . . , Xt−k+1 are not almost surely linearly dependent becausethe innovations of (Xt) are nondegenerate). �

Proof of (B.8). The vector of coefficients of the linear regression of Xt on Xt−1, . . . , Xt−hsatisfies

E

Xt−1...

Xt−h

( Xt−1 . . . Xt−h) ah,1

...

ah,h

= EXt

Xt−1...

Xt−h

. (B.10)

The last row of (B.10) yields

h∑i=1

ah,iγ (h− i) = γ (h).

Using (B.9), we thus have

ah,h = ρ(h)−h−1∑i=1

ρ(h− i)ah,i

= ρ(h)−h−1∑i=1

ρ(h− i)(ah−1,i − ah,hah−1,h−i)

= ρ(h)−∑h−1i=1 ρ(h− i)ah−1,i

1−∑h−1i=1 ρ(h− i)ah−1,h−i

,

which gives (B.8). �

Page 372: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

356 GARCH MODELS

Behavior of the Empirical Partial AutocorrelationThe empirical partial autocorrelation, r(h), is obtained form the algorithm (B.7)–(B.9), replacingthe theoretical autocorrelations ρX(k) by the empirical autocorrelations ρX(k), defined by

ρX(h) = γX(h)

γX(0), γX(h) = γX(−h) = n−1

n−h∑t=1

XtXt+h,

for h = 0, 1, . . . , n− 1. When (Xt ) is not assumed to be centered, Xt is replaced by Xt − Xn.In view of (B.5), we know that, for an AR(p) process, we have rX(h) = 0, for all h>p. Whenthe noise is strong, the asymptotic distribution of the r(h), h>p, is quite simple.

Theorem B.1 (Asymptotic distribution of the r(h)s for a strong AR(p) model) If X is thenonanticipative stationary solution of the AR(p) model

Xt −p∑i=1

aiXt−i = ηt , ηt iid(0, σ 2), σ 2 = 0, 1−p∑i=1

aizi = 0 ∀|z| ≤ 1,

then √nr(h)

L→ N (0, 1) , ∀h>p.

Proof. Let a0 = (a1, . . . , ap, 0, . . . , 0) be the vector of coefficients of the AR(h) model, whenh>p. Consider

X =

Xn−1 . . . Xn−hXn−2 . . . Xn−h−1...

X0 . . . X1−h

, Y =

Xn

Xn−1...

X1

and a = {X′X}−1X′Y ,

the coefficient of the empirical regression of Xt on Xt−1, . . . , Xt−h (taking Xt = 0 for t ≤ 0).

It can be shown that, as for a standard regression model,√n(a − a0)

L→ N (0,) , where

a.s.= σ 2 lim

n→∞ n−1 {X′X}−1 = σ 2

γX(0) γX(1) · · · γX(h− 1)γX(1) γX(0) · · · γX(h− 2)...

...

γX(h− 1) · · · γX(1) γX(0)

−1

.

Since rX(h) is the last component of a (by (B.5)), we have

√nr(h)

L→ N (0,(h, h)) ,

with

(h, h) = σ 2�(0, h− 1)

�(0, h), �(0, j) =

∣∣∣∣∣∣∣∣∣γX(0) γX(1) · · · γX(j − 1)γX(1) γX(0) · · · γX(j − 2)...

...

γX(j − 1) · · · γX(1) γX(0)

∣∣∣∣∣∣∣∣∣ .Applying the relations

γX(0)−h−1∑i=1

aiγX(i) = σ 2, γX(k)−h−1∑i=1

aiγX(k − i) = 0,

Page 373: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

AUTOCORRELATION AND PARTIAL AUTOCORRELATION 357

for k = 1, . . . , h− 1, we obtain

�(0, h) =

∣∣∣∣∣∣∣∣∣∣∣∣

γX(0) γX(1) · · · γX(h− 2) 0γX(1) γX(0) · · · γX(h− 3) 0...

......

...

γX(h− 2) γX(h− 1) · · · γX(0) 0γX(h− 1) γX(h− 2) · · · γX(1) γX(0)−

∑h−1i=1 aiγX(i)

∣∣∣∣∣∣∣∣∣∣∣∣= σ 2�(0, h− 1).

Thus (h, h) = 1, which completes the proof. �

The result of Theorem B.1 is no longer valid without the assumption that the noise ηt is iid.We can, however, obtain the asymptotic behavior of the r(h)s from that of the ρ(h)s. Let

ρm = (ρX(1), . . . , ρX(m)), ρm = (ρX(1), . . . , ρX(m)),

rm = (rX(1), . . . , rX(m)), rm = (rX(1), . . . , rX(m)).

Theorem B.2 (Distribution of the r(h)s from that of the ρ(h)s) When n→∞, if

√n (ρm − ρm)

L→ N (0,ρm

),

then √n (rm − rm)

L→ N (0,rm

), rm = JmρmJ

′m,

where the elements of the Jacobian matrix Jm are defined by Jm(i, j) = ∂rX(i)/∂ρX(j) and arerecursively obtained for k = 2, . . . , m by

∂rX(1)/∂ρX(j) = a(j)

1,1 = 1l{1}(j),

∂rX(k)/∂ρX(j) = a(j)

k,k =dkn

(j)

k − nkd(j)

k

d2k

,

nk = ρX(k)−k−1∑i=1

ρX(k − i)ak−1,i ,

dk = 1−k−1∑i=1

ρX(i)ak−1,i ,

n(j)

k = 1l{k}(j)− ak−1,k−j −k−1∑i=1

ρX(k − i)a(j)

k−1,i ,

d(j)

k = −ak−1,j −k−1∑i=1

ρX(i)a(j)

k−1,i ,

a(j)

k,i = a(j)

k−1,i − a(j)

k,kak−1,k−i − ak,ka(j)

k−1,k−i , i = 1, . . . , k − 1,

where ai,j = 0 for j ≤ 0 or j > i.

Page 374: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

358 GARCH MODELS

Proof. It suffices to apply the delta method,2 considering rX(h) as a differentiable function ofρX(1), . . . , ρX(h). �

It follows that for a GARCH, and more generally for a weak white noise, ρ(h) and r(h) havethe same asymptotic distribution.

Theorem B.3 (Asymptotic distribution of r(h) and ρ(h) for weak noises) If X is a weakwhite noise and √

nρmL→ N (0, ρm

),

then √nrm

L→ N (0,ρm

).

Proof. Consider the calculation of the derivatives a(j)k,i when ρX(h) = 0 for all h = 0. It is clear

that ak,i = 0 for all k and all i. We then have dk = 1, nk = 0 and n(j)

k = 1l{k}(j). We thus havea(j)

k,k = 1l{k}(j), and then Jm = Im. �

The following result is stronger because it shows that, for a white noise, ρ(h) and r(h) areasymptotically equivalent.

Theorem B.4 (Equivalence between r(h) and ρ(h) for a weak noise) If (Xt ) is a weak whitenoise satisfying the condition of Theorem B.3 and, for all fixed h,

√n(ah−1,1, . . . , ah−1,h−1

) = OP (1), (B.11)

where(ah−1,1, . . . , ah−1,h−1

)′is the vector of estimated coefficients of the linear regression of Xt

on Xt−1, . . . , Xt−h+1 (t = h, . . . , n), then

ρ(h)− r(h) = OP (n−1).

Proof. The result is straightforward for h = 1. For h> 1, we have by (B.8),

r(h) = ρ(h)−∑h−1i=1 ρ(h− i)ah−1,i

1−∑h−1i=1 ρ(i)ah−1,i

.

In view of the assumptions,

ρ(k) = oP (1),(ah−1,1, . . . , ah−1,h−1

)′ = oP (1)

andρ(k)ah−1,i = OP (n

−1),

for i = 1, . . . , h− 1 and k = 1, . . . , h. Thus

n {ρ(h)− r(h)} = n∑h−1

i=1 ah−1,i {ρ(h− i)− ρ(i)ρ(h)}1−∑h−1

i=1 ρ(i)ah−1,i

= Op(1).

Under mild assumptions, the left-hand side of (B.11) tends in law to a nondegenerate normal,which entails (B.11).

2 If√n(Xn − µ)

L→ N (0, ), for Xn in Rm, and g : Rm → Rk is of class C1 in a neighborhood of µ,

then√n {g(Xn)− g(µ)} L→ N (0, JJ ′), where J = {∂g(x)/∂x′} (µ).

Page 375: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

AUTOCORRELATION AND PARTIAL AUTOCORRELATION 359

B.2 Generalized Bartlett Formula for Nonlinear Processes

Let X1, . . . , Xn be observations of a centered second-order stationary process X = (Xt). Theempirical autocovariances and autocorrelations are defined by

γX(h) = γX(−h) = 1

n

n−h∑t=1

XtXt+h, ρX(h) = ρX(−h) = γX(h)

γX(0), (B.12)

for h = 0, . . . , n− 1. The following theorem provides an expression for the asymptotic varianceof these estimators. This expression, which will be called Bartlett’s formula, is relatively easy tocompute. For the empirical autocorrelations of (strongly) linear processes, we obtain the standardBartlett formula, involving only the theoretical autocorrelation function of the observed process.For nonlinear processes admitting a weakly linear representation, the generalized Bartlett formulainvolves the autocorrelation function of the observed process, the kurtosis of the linear innovationprocess and the autocorrelation function of its square. This formula is obtained under a symmetryassumption on the linear innovation process.

Theorem B.5 (Bartlett’s formula for weakly linear processes) We assume that (Xt )t∈Z admitsthe Wold representation

Xt =∞∑

i=−∞ψiεt−i ,

∞∑i=−∞

|ψi | <∞,

where (εt )t∈Z is a weak white noise such that Eε4t := κε(Eε

2t )

2 <∞, and

Eεt1εt2εt3εt4 = 0 when t1 = t2, t1 = t3 and t1 = t4. (B.13)

With the notation ρε2 =∑+∞h=−∞ ρε2(h), we have

limn→∞ nCov {γX(i), γX(j)} = (κε − 3)γX(i)γX(j)

+∞∑

=−∞γX( ) {γX( + j − i)+ γX( − j − i)}

+ (ρε2 − 3)(κε − 1)γX(i)γX(j)

+ (κε − 1)∞∑

=−∞γX( − i) {γX( − j)+ γX( + j)} ρε2( ). (B.14)

If √n(γ0:m − γ0,m

) L→ N (0, γ0:m

)as n→∞,

where the elements of γ0:m are given by (B.14), then

√n (ρm − ρm)

L→ N (0,ρm

),

where the elements of ρm are given by the generalized Bartlett formula

limn→∞nCov {ρX(i), ρX(j)} = vij + v∗ij , i > 0, j > 0, (B.15)

Page 376: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

360 GARCH MODELS

vij =∞∑

=−∞ρX( )

[2ρX(i)ρX(j)ρX( )− 2ρX(i)ρX( + j)

− 2ρX(j)ρX( + i)+ ρX( + j − i)+ ρX( − j − i)],

v∗ij = (κε − 1)∞∑

=−∞ρε2( )

[2ρX(i)ρX(j)ρ

2X( )− 2ρX(j)ρX( )ρX( + i)

− 2ρX(i)ρX( )ρX( + j)+ ρX( + i) {ρX( + j)+ ρX( − j)}] .Remark B.1 (Symmetry condition (B.13) for a GARCH process) In view of (7.24), we knowthat if (εt ) is a GARCH process with a symmetric distribution for ηt and if Eε4

t <∞, then (B.13)is satisfied (see Exercise 7.12).

Remark B.2 (The generalized formula contains the standard formula) The right-hand sideof (B.14) is a sum of four terms. When the sequence (ε2

t ) is uncorrelated, the sum of the lastterms terms is equal to

−2(κε − 1)γX(i)γX(j)+ (κε − 1)γX(i) {γX(j)+ γX(−j)} = 0.

In this case, we retrieve the standard Bartlett formula (1.1). We also have v∗ij = 0, and we retrievethe standard Bartlett formula for the ρ(h)s.

Example B.1 If we have a white noise Xt = εt that satisfies the assumptions of the theorem,then we have the generalized Bartlett formula (B.15) with{

vi,j = v∗i,j = 0, for i = j

vi,i = 1 and v∗i,i =γε2 (i)

γ 2ε (0)

, for i > 0.

Proof. Using (B.13) and the notation ψi1,i2,i3,i4 = ψi1ψi2ψi3ψi4 , we obtain

EXtXt+iXt+hXt+j+h =∑

i1,i2,i3,i4

ψi1,i2,i3,i4Eεt−i1εt+i−i2εt+h−i3εt+j+h−i4 (B.16)

=∑i1,i3

ψi1,i1+i,i3,i3+jEε2t−i1ε

2t+h−i3 +

∑i1,i2

ψi1,i1+h,i2,i2+h+j−iEε2t−i1ε

2t+i−i2

+∑i1,i2

ψi1,i1+h+j,i2,i2+h−iEε2t−i1ε

2t+i−i2 − 2Eε4

t

∑i1

ψi1,i1+i,i1+h,i1+h+j .

The last equality is obtained by summing over the i1, i2, i3, i4 such that the indices of{εt−i1 , εt+i−i2 , εt+h−i3 , εt+j+h−i4

}are pairwise equal, which corresponds to three sums, and then

by subtracting twice the sum in which the indices of{εt−i1 , εt+i−i2 , εt+h−i3 , εt+j+h−i4

}are all

equal (the latter sum being counted three times in the first three sums). We also have

γX(i) =∑i1,i2

ψi1ψi2Eεt−i1εt+i−i2 = γε(0)∑i1

ψi1ψi1+i . (B.17)

By stationarity and the dominated convergence theorem, it is easily shown that

limn→∞nCov {γX(i), γX(j)} =

∞∑h=−∞

Cov{XtXt+i , Xt+hXt+j+h

}.

Page 377: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

AUTOCORRELATION AND PARTIAL AUTOCORRELATION 361

The existence of this sum is guaranteed by the conditions∑i

|ψi | <∞ and∑h

|ρε2(h)| <∞.

In view of (B.16) and (B.17), this sum is equal to∑i1,i3

ψi1,i1+i,i3,i3+j∑h

(Eε2

t−i1ε2t+h−i3 − γ 2

ε (0))

+∑h,i1,i2

ψi1,i1+h,i2,i2+h+j−iEε2t−i1ε

2t+i−i2

+∑h,i1,i2

ψi1,i1+h+j,i2,i2+h−iEε2t−i1ε

2t+i−i2 − 2Eε4

t

∑h,i1

ψi1,i1+i,i1+h,i1+h+j

= ρε2

∑i1

ψi1ψi1+i∑i3

ψi3ψi3+j

× γε2 (0)

+∑i1,i2

ψi1ψi2Eε2t−i1ε

2t+i−i2

∑h

ψi1+hψi2+h+j−i

+∑i1,i2

ψi1ψi2Eε2t−i1ε

2t+i−i2

∑h

ψi1+h+jψi2+h−i

− 2Eε4t

∑i1

ψi1ψi1+i∑h

ψi1+hψi1+h+j ,

using Fubini’s theorem. With the notation Eε4t = κεγ

2ε (0), and using once again (B.17) and the

relation γε2(0) = (κε − 1)γ 2ε (0), we obtain

limn→∞nCov {γX(i), γX(j)} = γε2 (0)ρε2γ

−2ε (0)γX(i)γX(j)

+∑i1,i2

ψi1ψi2Eε2t−i1ε

2t+i−i2γ

−1ε (0)γX(i2 + j − i − i1)

+∑i1,i2

ψi1ψi2Eε2t−i1ε

2t+i−i2γ

−1ε (0)γX(i2 − j − i − i1)

− 2Eε4t γ−2ε (0)γX(i)γX(j)

= {(κε − 1)ρε2 − 2κε}γX(i)γX(j)

+ γ−1ε (0)

∑i1,i2

ψi1ψi2 {γX(i2 + j − i − i1)+ γX(i2 − j − i − i1)}

× {γε2(i − i2 + i1)+ γ 2ε (0)

}.

With the change of index = i2 − i1, we finally obtain

limn→∞ nCov {γX(i), γX(j)} =

{(κε − 1)ρε2 − 2κε

}γX(i)γX(j) (B.18)

+ γ−2ε (0)

∞∑ =−∞

γX( ) {γX( + j − i)+ γX( − j − i)} {γε2 (i − )+ γ 2ε (0)

},

Page 378: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

362 GARCH MODELS

which can also be written in the form (B.14) (see Exercise 1.11). The vector (ρX(i), ρX(j)) is afunction of the triplet (γX(0), γX(i), γX(j)). The Jacobian of this differentiable transformation is

J = − γX(i)

γ 2X(0)

1γX(0)

0

− γX(j)

γ 2X(0)

0 1γX(0)

.

Let be the matrix of asymptotic variance of the triplet, whose elements are given by (B.18). Bythe delta method, we obtain

limn→∞ nCov {ρ(i), ρ(j)} = JJ ′(1, 2)

= γX(i)γX(j)

γ 4X(0)

(1, 1)− γX(i)

γ 3X(0)

(1, 3)− γX(j)

γ 3X(0)

(2, 1)+ 1

γ 2X(0)

(2, 3)

= {(κε − 1)ρε2 − 2κε} {

2γX(i)γX(j)

γ 2X(0)

− 2γX(i)γX(j)

γ 2X(0)

}

+ γ−2ε (0)

∞∑ =−∞

[γX(i)γX(j)

γ 4X(0)

2γ 2X( )

{γε2 (− )+ γ 2

ε (0)}

− γX(i)

γ 3X(0)

γX( ) {γX( + j)+ γX( − j)} {γε2(− )+ γ 2ε (0)

}−γX(j)γ 3X(0)

γX( ) {γX( − i)+ γX( − i)} {γε2 (i − )+ γ 2ε (0)

}+ 1

γ 2X(0)

γX( ) {γX( + j − i)+ γX( − j − i)} {γε2 (i − )+ γ 2ε (0)

}].

Simplifying and using the autocorrelations, the previous quantity is equal to

∞∑ =−∞

[2ρX(i)ρX(j)ρ

2X( )− ρX(i)ρX( ) {ρX( + j)+ ρX( − j)}

−ρX(j)ρX( ) {ρX( − i)+ ρX( − i)} + ρX( ) {ρX( + j − i)+ ρX( − j − i)}]+ (κε − 1)

∞∑ =−∞

ρε2( )[2ρX(i)ρX(j)ρ2

X( ) − ρX(i)ρX( ) {ρX( + j)+ ρX( − j)}

−ρX(j)ρX( − i) {ρX( )+ ρX( )} +ρX(i − ) {ρX(− + j)+ ρX(− − j)}] .Noting that

ρX( )ρX( + j) =∑

ρX( )ρX( − j),

Page 379: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

AUTOCORRELATION AND PARTIAL AUTOCORRELATION 363

we obtain

limn→∞nCov {ρ(i), ρ(j)}

=∞∑

=−∞ρX( )

[2ρX(i)ρX(j)ρX( )− 2ρX(i)ρX( + j)

− 2ρX(j)ρX( + i)+ ρX( + j − i)+ ρX( − j − i)]

+ (κε − 1)∞∑

=−∞ρε2( )

[2ρX(i)ρX(j)ρ

2X( ) − 2ρX(i)ρX( )ρX( + j)

− 2ρX(j)ρX( )ρX( + i) +ρX( + i) {ρX( + j)+ ρX( − j)}] .�

Page 380: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 381: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Appendix C

Solutions to the Exercises

Chapter 1

1.1 .1. .(a) We have the stationary solution Xt =∑

i≥0 0.5i (ηt−i + 1), with mean EXt = 2 andautocorrelations ρX(h) = 0.5|h|.

(b) We have an ‘anticipative’ stationary solution

Xt = −1− 1

2

∑i≥0

0.5iηt+i+1,

which is such that EXt = −1 and ρX(h) = 0.5|h|.

(c) The stationary solution

Xt = 2+∑i≥0

0.5i (ηt−i − 0.4ηt−i−1)

is such that EXt = 2 with ρX(1) = 2/19 and ρX(h) = 0.5h−1ρX(1) for h> 1.

2. The compatible models are respectively ARMA(1, 2), MA(3) and ARMA(1, 1).

3. The first noise is strong, and the second is weak because

Cov{(ηtηt−1)

2 , (ηt−1ηt−2)2} = Eη2

t η4t−1η

2t−2 − 1 = 0.

Note that, by Jensen’s inequality, this correlation is positive.

1.2 Without loss of generality, assume Xt = Xn for t < 1 or t > n. We have

n−1∑h=−n+1

γ (h) = 1

n

∑h,t

(Xt − Xn)(Xt+h −Xn) = 1

n

{n∑t=1

(Xt −Xn)

}2

= 0,

which gives 1+ 2∑n−1

h=1 ρ(h) = 0, and the result follows.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 382: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

366 GARCH MODELS

1.3 Consider the degenerate sequence (Xt )t=0,1,... defined, on a probability space (�,A,P),by Xt(ω) = (−1)t for all ω ∈ � and all t ≥ 0. With probability 1, the sequence {(−1)t }is the realization of the process (Xt). This process is nonstationary because, for instance,EX0 = EX1.

Let U be a random variable, uniformly distributed on {0, 1}. We define the process(Yt )t=0,1,... by

Yt (ω) = (−1)t+U(ω)

for any ω ∈ � and any t ≥ 0. The process (Yt ) is stationary. We have in particular EYt = 0and Cov (Yt , Yt+h) = (−1)h. With probability 1/2, the realization of the stationary process(Yt ) will be the sequence {(−1)t } (and with probability 1/2, it will be {(−1)t+1}).

This example leads us to think that it is virtually impossible to determine whether aprocess is stationary or not, from the observation of only one trajectory, even of infinite length.However, practitioners do not consider {(−1)t } as a potential realization of the stationaryprocess (Yt ). It is more natural, and simpler, to suppose that {(−1)t } is generated by thenonstationary process (Xt ).

1.4 The sequence 0, 1, 0, 1, . . . is a realization of the process Xt = 0.5(1+ (−1)tA), where A isa random variable such that P [A = 1] = P [A = −1] = 0.5. It can easily be seen that (Xt)

is strictly stationary.Let �∗ = {ω | X2t = 1, X2t+1 = 0, ∀t}. If (Xt ) is ergodic and stationary, the empirical

means n−1∑nt=1 1lX2t+1=1 and n−1∑n

t=1 1lX2t=1 both converge to the same limit P [Xt = 1]with probability 1, by the ergodic theorem. For all ω ∈ �∗ these means are respectively equalto 1 and 0. Thus P(�∗) = 0. The probability of such a trajectory is thus equal to zero forany ergodic and stationary process.

1.5 We have Eεt = 0, Var εt = 1 and Cov(εt , εt−h) = 0 when h = 0, thus (εt ) is a weak whitenoise. We also have Cov(ε2

t , ε2t−1) = Eη2

t η4t−1 . . . η

4t−kη

2t−k−1 − 1 = 3k − 1 = 0, thus εt and

εt−1 are not independent, which shows that (εt ) is not a strong white noise.

1.6 Assume h> 0. Define the random variable ρ(h) = γ (h)/γ (0), where γ (h) =n−1∑n

t=1 εtεt−h. It is easy to see that√nρ(h) has the same asymptotic variance

(and also the same asymptotic distribution) as√nρ(h). Using γ (0)→ 1, stationarity, and

Lebesgue’s theorem, this asymptotic variance is equal to

Var√nγ (h) = n−1

n∑t,s=1

Cov (εt εt−h, εsεs−h)

= n−1n−1∑

=−n+1

(n− | |)Cov (ε1ε1−h, ε1+ ε1+ −h)

→∞∑

=−∞Cov (ε1ε1−h, ε1+ ε1+ −h)

= Eε21ε

21−h =

{3k−h+1 if 0 < h ≤ k

1 if h>k.

This value can be arbitrarily larger than 1, which is the value of the asymptotic variance ofthe empirical autocorrelations of a strong white noise.

1.7 It is clear that (ε2t ) is a second-order stationary process. By construction, εt and εt−h are

independent when h>k, thus γε2 (h) := Cov(ε2t , ε

2t−h) = 0 for all h> k. Moreover, γε2 (h) =

3k+1−h − 1, for h = 0, . . . , k. In view of Theorem 1.2, ε2t − 1 thus follows an MA(k) process.

Page 383: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 367

In the case k = 1, we haveε2t = 1+ ut + but−1,

where |b| < 1 and (ut ) is a white noise of variance σ 2. The coefficients b and σ 2 aredetermined by

γε2(0) = 8 = σ 2(1+ b2), γε2(1) = 2 = bσ 2,

which gives b = 2−√3 and σ 2 = 2/b.

1.8 Reasoning as in Exercise 1.6, the asymptotic variance is equal to

Eε21ε

21−h

(Eε21)

2= E

η21

η21−k

η21−h

η21−h−k

(E

η21

η21−k

)−2

={ (

Eη21Eη

−21

)−1if 0 < h = k

1 if 0 < h = k.

Since E(η−21 ) ≥ (Eη2

1)−1, for k = h the asymptotic variance can be arbitrarily smaller than

1, which corresponds to the asymptotic variance of the empirical autocorrelations of a strongwhite noise.

1.9 .1. We have ∥∥∥∥∥n∑

k=makηt−k

∥∥∥∥∥2

≤n∑

k=m|a|kσ → 0

when n>m and m→∞. The sequence {ut (n)}n defined by un =∑n

k=0 akηt−k is a

Cauchy sequence in L2, and thus converges in quadratic mean. A priori ,

∞∑k=0

|akηt−k| := limn↑

n∑k=0

|akηt−k|

exists in R ∪ +{∞}. Using Beppo Levi’s theorem,

E

∞∑k=0

|akηt−k| = (E|ηt |)∞∑k=0

|ak | <∞,

which shows that the limit∑∞

k=0 |akηt−k| is finite almost surely. Thus, as n→∞, ut (n)converges, both almost surely and in quadratic mean, to ut =

∑∞k=0 a

kηt−k . Since

ut (n) = aut−1(n− 1)+ ηt , ∀n,

we obtain, taking the limit as n→∞ of both sides of the equality, ut = aut−1 + ηt . Thisshows that (Xt) = (ut ) is a stationary solution of the AR(1) equation.

Finally, assume the existence of two stationary solutions to the equation Xt = aXt−1 +ηt and ut = aut−1 + ηt . If ut0 = Xt0 , then

0 <∣∣ut0 −Xt0

∣∣ = |an| ∣∣ut0−n −Xt0−n∣∣ , ∀n,

which entails

lim supn→∞

|ut0−n| = +∞ or lim supn→∞

|Xt0−n| = +∞.

This is in contradiction to the assumption that the two sequences are stationary, whichshows the uniqueness of the stationary solution.

Page 384: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

368 GARCH MODELS

2. We have Xt = ηt + aηt−1 + · · · + akηt−k + ak+1Xt−k−1. Since |a| = 1,

Var(Xt − ak+1Xt−k−1

) = (k + 1)σ 2 →∞

as k→∞. If (Xt) were stationary,

Var(Xt − ak+1Xt−k−1

) = 2 {VarXt ± Cov (Xt , Xt−k−1)} ,

and we would have

limk→∞

|Cov (Xt ,Xt−k−1)| = ∞.

This is impossible, because by the Cauchy–Schwarz inequality,

|Cov (Xt ,Xt−k−1)| ≤ VarXt .

3. The argument used in part 1 shows that

vt (n) := −n∑

k=1

1

akηt+k

n→∞→ vt = −∞∑k=1

1

akηt+k

almost surely and in quadratic mean. Since

vt (n) = avt−1(n+ 1)+ ηt

for all n, (vt ) is a stationary solution (which is called anticipative, because it is a functionof the future values of the noise) of the AR(1) equation. The uniqueness of the stationarysolution is shown as in part 1.

4. The autocovariance function of the stationary solution is

γ (0) = σ 2∞∑k=1

1

a2k= σ 2

a2 − 1, γ (h) = 1

aγ (h− 1) h> 0.

We thus have Eεt = 0 and, for all h> 0,

Cov(εt , εt−h) = γ (h)− 1

aγ (h− 1)− 1

aγ (h+ 1)+ 1

a2γ (h) = 0,

which confirms that εt is a white noise.

1.10 In Figure 1.6(a) we note that several empirical autocorrelations are outside the 95% signif-icance band, which leads us to think that the series may not be the realization of a strongwhite noise. Inspection of Figure 1.6(b) confirms that the observed series ε1, . . . , εn cannotbe generated by a strong white noise, otherwise the series ε2

1 , . . . , ε2n would also be uncorre-

lated. Clearly, this is not the case, because several empirical autocorrelations go far beyondthe significance band. By contrast, it is plausible that the series is a weak noise. We know thatBartlett’s formula giving the limits ±1.96/

√n is not valid for a weak noise (see Exercises

1.6 and 1.8). On the other hand, we know that the square of a weak noise can be correlated(see Exercise 1.7).

Page 385: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 369

1.11 Using the relation γε2(0) = (ηε − 1)γ 2ε (0), equation (B.18) can be written as

limn→∞nCov {γX(i), γX(j)} =

{(ηε − 1)ρε2 − 2ηε

}γX(i)γX(j)

+ γ−2ε (0)

∞∑ =−∞

γX( ) {γX( + j − i)+ γX( − j − i)} {γε2(i − )+ γ 2ε (0)

}= (ρε2 − 3)(ηε − 1)γX(i)γX(j)+ (ηε − 3)γX(i)γX(j)

+∞∑

=−∞γX( ) {γX( + j − i)+ γX( − j − i)}

+ (ηε − 1)∞∑

=−∞γX( ) {γX( + j − i)+ γX( − j − i)} ρε2 (i − ).

With the change of index h = i − , we obtain

∞∑ =−∞

γX( ) {γX( + j − i)+ γX( − j − i)} ρε2(i − )

=∞∑

h=−∞γX(−h+ i) {γX(−h+ j)+ γX(−h− j)} ρε2 (h),

which gives (B.14), using the parity of the autocovariance functions.

1.12 We can assume i ≥ 0 and j ≥ 0. Since γX( ) = γε( ) = 0 for all = 0, formula (B.18)yields

limn→∞ nCov {γX(i), γX(j)} = γ−1

ε (0) {γX(j − i)+ γX(−j − i)} {γε2 (i)+ γ 2ε (0)

}for (i, j) = (0, 0) and

limn→∞nCov {γX(0), γX(0)} =

{(ηε − 1)ρε2 − 2ηε

}γ 2ε (0)+ 2

{γε2 (i)+ γ 2

ε (0)}.

Thus

limn→∞ nCov {γX(i), γX(j)} =

0 if i = j

Eε2t ε

2t+i if i = j = 0

γε2 (0)ρε2 − 2Eε4t + 2Eε2

t ε2t+i if i = j = 0.

In formula (B.15), we have vij = 0 when i = j and vii = 1. We also have v∗ij = 0 wheni = j and v∗ii = (ηε − 1)ρε2(i) for all i = 0. Since (ηε − 1) = γε2 (0)γ−2

ε (0), we obtain

limn→∞nCov {ρX(i), ρX(j)} =

{0 if i = jEε2

t ε2t+i

γ 2ε (0)

if i = j = 0.

For significance intervals Ch of asymptotic level 1− α, such that limn→∞ P [ρ(h) ∈ Ch] =1− α, we have

M =m∑h=1

1lρ(h)∈Ch .

Page 386: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

370 GARCH MODELS

By definition of Ch,

E(M) =m∑h=1

P [ρ(h) ∈ Ch] → m(1− α), as n→∞.

Moreover,

Var(M) =m∑h=1

Var(1lρ(h)∈Ch)+∑h=h′

Cov(1lρ(h)∈Ch, 1lρ(h′)∈Ch′ )

=m∑h=1

P(ρ(h) ∈ Ch){1− P(ρ(h) ∈ Ch)}

+∑h=h′

{P(ρ(h) ∈ Ch, ρ(h′) ∈ Ch′)− P(ρ(h) ∈ Ch)P (ρ(h

′) ∈ Ch′ )}

→ mα(1− α), as n→∞.

We have used the convergence in law of (ρ(h), ρ(h′)) to a vector of independent variables.When the observed process is not a noise, this asymptotic independence does not hold ingeneral.

1.13 The probability that all the empirical autocorrelations stay within the asymptotic signifi-cance intervals (with the notation of the solution to Exercise 1.12) is, by the asymptoticindependence,

P [ρ(1) ∈ C1, . . . , ρ(m) ∈ Cm] → (1− α)m, as n→∞.

For m = 20 and α = 5%, this limit is equal to 0.36. The probability of not rejecting the rightmodel is thus low.

1.14 In view of (B.7) we have rX(1) = ρX(1). Using step (B.8) with k = 2 and a1,1 = ρX(1), weobtain

rX(2) = a2,2 = ρX(2)− ρ2X(1)

1− ρ2X(1)

.

Then, step (B.9) yields

a2,1 = ρX(1)− a2,2ρX(1)

= ρX(1)− ρX(2)− ρ2X(1)

1− ρ2X(1)

ρX(1) = ρX(1)1− ρX(2)

1− ρ2X(1)

.

Finally, step (B.8) yields

rX(3) = ρX(3)− ρX(2)a2,1 − ρX(1)a2,2

1− ρX(1)a2,1 − ρX(2)a2,2

= ρX(3){1− ρ2

X(1)}− ρX(2)ρX(1) {1− ρX(2)} − ρX(1)

{ρX(2)− ρ2

X(1)}

1− ρ2X(1)− ρ2

X(1) {1− ρX(2)} − ρX(2){ρX(2)− ρ2

X(1)} .

Page 387: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 371

SP 500 from 1/3/50 to 7/24/09S

P50

0 P

rices

0 5000 10000 15000

050

010

0015

00

SP500 Returns

SP

500

Ret

urns

0 5000 10000 15000

0.20

0.10

0.00

0.10

0 10 20 30 40

−0.0

50.

050.

15

AC

F

Autocorrelations of the returns

0 10 20 30 40

−0.0

50.

050.

15

AC

F

ACF of the squared returns

Figure C.1 Closing prices and returns of the S&P 500 index from January 3, 1950 to July 24,2009.

1.15 The historical data from January 3, 1950 to July 24, 2009 can be downloaded via the URLhttp://fr.finance.yahoo.com/q/hp?s = %5EGSPC. We obtain Figure C.1 with thefollowing R code:

> # reading the SP500 data set

> sp500data <- read.table("sp500.csv",header=TRUE,sep=",")

> sp500<-rev(sp500data$Close) # closing price> n<-length(sp500)

> rend<-log(sp500[2:n]/sp500[1:(n-1)]); rend2<-rend^2

> op <- par(mfrow = c(2, 2)) # 2 × 2 figures per page

> plot(ts(sp500),main="SP 500 from 1/3/50 to 7/24/09",+ ylab="SP500 Prices",xlab="")

> plot(ts(rend),main="SP500 Returns",ylab="SP500 Returns",xlab="")

> acf(rend, main="Autocorrelations of the returns",xlab="",ylim=c(-0.05,0.2))

> acf(rend2, main="ACF of the squared returns",xlab="",ylim=c(-0.05,0.2))> par(op)

Chapter 2

2.1 This covariance is meaningful only if Eε2t <∞ and Ef 2(εt−h) <∞. Under these assump-

tions, the equality is true and follows from E(εt | εu, u < t) = 0.

Page 388: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

372 GARCH MODELS

2.2 In case (i) the strict stationarity condition becomes α + β < 1. In case (ii) elementary integralcomputations show that the condition is

2

√β

3αarctan

√3α

β+ log(3α + β) < 2.

2.3 Let λ1, . . . , λm be the eigenvalues of A. If A is diagonalizable, there exist an invertible matrixP and a diagonal matrix D such that A = P−1DP. It follows that, taking a multiplicativenorm,

log ‖At‖ = log ‖P−1DtP ‖ ≤ log ‖P−1‖‖Dt‖‖P ‖= log ‖P−1‖ + log ‖Dt‖ + log ‖P ‖.

For the multiplicative norm ‖A‖ =∑ |aij | we have log ‖Dt‖ = log∑m

i=1 λti . The result fol-

lows immediately.When A is any square matrix, the Jordan representation can be used. Let ni be the

multiplicity of the eigenvalue λi . We have the Jordan canonical form A = P−1JP, where Pis invertible and J is the block-diagonal matrix with a diagonal of m matrices Ji(λi), of sizeni × ni , with λi on the diagonal, 1 on the superdiagonal, and 0 elsewhere. It follows thatAt = P−1J tP where J t is the block-diagonal matrix whose blocks are the matrices J ti (λi).We have Ji(λi) = λiIni +Ni where Ni is such that Nni

i = 0ni×ni . It can be assumed that|λ1|> |λ2|> . . . > |λm|. It follows that

1

tlog ‖J t‖ = 1

tlog

m∑i=1

‖J ti (λi)‖ =1

tlog

m∑i=1

‖(λiIni +Ni)t‖

= |λ1| + 1

tlog

m∑i=1

∥∥∥∥( λiλ1Ini +

1

λ1Ni

)t∥∥∥∥= |λ1| + 1

tlog

m∑i=1

ni−1∑k=0

(t

k

) ∣∣∣∣ λiλ1

∣∣∣∣t−k 1

|λ1|k ‖Nki ‖

→ |λ1|

as t →∞, and the proof easily follows.

2.4 We use the multiplicative norm ‖A‖ =∑ |aij |. Thus log ‖Azt‖ ≤ log ‖A‖ + log ‖zt‖, there-fore log+ ‖Azt‖ ≤ log ‖A‖ + log+ |zt |, which admits a finite expectation by assumption. Itfollows that γ exists. We have

log (‖AtAt−1 . . . A1‖) =(

log ‖At‖t∏

i=1

zi

)= log ‖At‖ +

t∑i=1

log |zi |

and thus

γ = limt→∞ a.s.

(1

tlog ‖At‖ + 1

t

t∑i=1

log |zi |).

Using (2.21) and the ergodic theorem, we obtain

γ = log ρ(A)+ E log |zt |.

Consequently, γ < 0 if and only if ρ(A) < exp (−E log |zt |) .

Page 389: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 373

2.5 For the Euclidean norm, multiplicativity follows from the Cauchy–Schwarz inequality. SinceN(A) = supx =0 ‖Ax‖/‖x‖, we have

N(AB) = supx =0,Bx =0

‖ABx‖‖Bx‖

‖Bx‖‖x‖ ≤ sup

x =0,Bx =0

‖ABx‖‖Bx‖ sup

x =0

‖Bx‖‖x‖ ≤ N(A)N(B).

To show that the norm N1 is not multiplicative, consider the matrix A whose elements areall equal to 1: we then have N1(A) = 1 but N1(A

2)> 1.

2.6 We have

A∗t =

β1 + α1η2t−1 β2 · · · βp α2 · · · αq

1 0 · · · 0 0 · · · 00 1 · · · 0 0 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 1 0 0 . . . 0 0

η2t−1 0 · · · 0 · · · 0

0 · · · 0 1 0 · · · 00 · · · 0 0 1 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 0 0 0 . . . 1 0

and b∗t = (ω, 0, . . . , 0)′.

2.7 We have εt =√ω + α1ε

2t−1 + α2ε

2t−2ηt , therefore, under the condition α1 + α2 < 1, the

moment of order 2 is given by

Eε2t =

ω

1− α1 − α2

(see Theorem 2.5 and Remark 2.6(1)). The strictly stationary solution satisfies

Eε4t = µ4E(ω + α1ε

2t−1 + α2ε

2t−2)

2

= µ4{ω2 + (α21 + α2

2)Eε4t + 2ω(α1 + α2)Eε

2t + 2α1α2Eε

2t ε

2t−1}

in R ∪ {+∞}. Moreover,

Eε2t ε

2t−1 = E(ω + α1ε

2t−1 + α2ε

2t−2)ε

2t−1 = ωEε2

t + α1Eε4t + α2Eε

2t−2ε

2t−1,

which gives

(1− α2)Eε2t ε

2t−1 = ωEε2

t + α1Eε4t .

Using this relation in the previous expression for Eε4t , we obtain

Eε4t

[1− µ4

1− α2

{α2

1(1+ α2)+ α22(1− α2)

}]= µ4

[ω2 + 2ω2

(1− α2)(1− α1 − α2){α1 + α2(1− α2)}

].

Page 390: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

374 GARCH MODELS

0.1 0.2

a2

a1

0.3 0.4 0.5

0.1

0.2

0.3

0.4

0.5

Figure C.2 Region of existence of the fourth-order moment for an ARCH(2) model (whenµ4 = 3).

If Eε4t <∞, then the term in brackets on the left-hand side of the equality must be strictly

positive, which gives the condition for the existence of the fourth-order moment. Note thatthe condition is not symmetric in α1 and α2. In Figure C.2, the points (α1, α2) under thecurve correspond to ARCH(2) models with a fourth-order moment. For these models,

Eε4t =

µ4ω2(1+ α1 + α1α2 − α2

2)

(1− α1 − α2)[1− α2 − µ4

{α2

1(1+ α2)+ α22(1− α2)

}] .2.8 We have seen that (ε2

t ) admits the ARMA(1, 1) representation

ε2t − (α + β)ε2

t−1 = ω + νt − βνt−1,

where νt = ε2t − E(ε2

t |εu, u < t) is a (weak) white noise. The autocorrelation function of ε2t

thus satisfies

ρε2 (h) = (α + β)ρε2(h− 1), ∀h> 1. (C.1)

Using the MA(∞) representation

ε2t =

ω

1− α − β+ νt + α

∞∑i=1

(α + β)i−1νt−i ,

we obtain

γε2(0) = Eν2t

(1+ α2

∞∑i=1

(α + β)2(i−1)

)= Eν2

t

(1+ α2

1− (α + β)2

)and

γε2(1) = Eν2t

(α + α2(α + β)

∞∑i=1

(α + β)2(i−1)

)= Eν2

t

(α + α2(α + β)

1− (α + β)2

).

It follows that the lag 1 autocorrelation is

ρε2 (1) = α(1− β2 − αβ

)1− β2 − 2αβ

.

Page 391: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 375

The other autocorrelations are obtained from (C.1) and ρε2 (1). To determine the autocovari-ances, all that remains is to compute

Eν2t = E(ε2

t − σ 2t )

2 = E(η2t − 1)2Eσ 4

t = 2Eσ 4t ,

which is given by

Eσ 4t = E(ω + αε2

t + βσ 2t )

2

= ω2 + 3α2Eσ 4t + β2Eσ 4

t + 2ω(α + β)Eσ 2t + 2αβEσ 4

t

=ω2 + 2ω(α + β) ω

1−α−β1− 3α2 − β2 − 2αβ

= ω2(1+ α + β)

(1− α − β)(1− 3α2 − β2 − 2αβ).

2.9 The vectorial representation zt= bt + Atzt−1 is(

ε2t

σ 2t

)=(

ωη2t

ω

)+(

αη2t βη2

t

α β

)(ε2t−1σ 2t−1

).

We have

z(1) = ω

1− α − β

(11

), b(1) = ω

(11

),

EAt ⊗ bt = Ebt ⊗ At = ω

3α 3βα β

α β

α β

, A(1) =(

α β

α β

)

b(2) = ω2

3111

, A(2) =

3α2 3αβ 3αβ 3β2

α2 αβ αβ β2

α2 αβ αβ β2

α2 αβ αβ β2

.

The eigenvalues of A(2) are 0, 0, 0 and 3α2 + 2αβ + β2, thus I4 − A(2) is invertible (0 isan eigenvalue of I4 − A(2) if and only if 1 is an eigenvalue of A(2)), and the system (2.63)admits a unique solution. We have

b(2) + (EAt ⊗ bt + Ebt ⊗ At

)z(1) = ω2 1+ α + β

1− α − β

3111

.

The solution to (2.63) is

z(2) = ω2(1+ α + β)

(1− α − β)(1− 3α2 − 2αβ − β2)

3111

.

As first component of this vector we recognize Eε4t , and the other three components are

equal to Eσ 4t . Equation (2.64) yields

Ezt⊗ z

t−h =ω2

1− α − β

1111

+

α 0 β 00 α 0 β

α 0 β 00 α 0 β

Ezt⊗ z

t−h+1,

Page 392: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

376 GARCH MODELS

which gives γε2 (·), but with tedious computations, compared to the direct method utilized inExercise 2.8.

2.10 .1. Subtracting the (q + 1)th line of (λIp+q − A) from the first, then expanding the determi-nant along the first row, and using (2.32), we obtain

det(λIp+q − A) = λq

∣∣∣∣∣∣∣∣∣∣∣

λ− β1 −β2 · · · −βp−1 λ · · · 00 −1 · · · 0...

. . .. . .

...

0 · · · −1 λ

∣∣∣∣∣∣∣∣∣∣∣

+ (−1)q(−λ)

∣∣∣∣∣∣∣∣∣∣∣

−1 λ · · · 00 −1 · · · 0...

. . .. . .

...

0 · · · −1 λ

−α1 −α2 · · · −αq

∣∣∣∣∣∣∣∣∣∣∣λp−1

= λp+q

1−p∑j=1

βjλ−j

+ (−1)q+1(−1)q−1λp

∣∣∣∣∣∣∣∣∣∣∣

−α1 −α2 · · · −αq−1 λ · · · 00 −1 · · · 0...

. . .. . .

...

0 · · · −1 λ

∣∣∣∣∣∣∣∣∣∣∣= λp+q

1−p∑j=1

βjλ−j

+ λp+q

(1− (α1 + λ)λ−1 −

q∑i=2

αiλ−i),

and the result follows.

2. When∑q

i=1 αi +∑p

j=1 βj = 1 the previous determinant is equal to zero at λ = 1. Thusρ(A) ≥ 1. Now, let λ be a complex number of modulus strictly greater than 1. Using theinequality |a − b| ≥ |a| − |b|, we then obtain

∣∣det(A− λIp+q)∣∣> 1−

r∑i=1

(αi + βi) = 0.

It follows that ρ(A) ≤ 1 and thus ρ(A) = 1.

2.11 For all ε > 0, noting that the function f (t) = P(t−1|X1|>ε

)is decreasing, we have

∞∑n=1

P(n−1|Xn|>ε

) = ∞∑n=1

P(n−1|X1|>ε

) ≤ ∫ ∞

0P(t−1|X1|>ε

)dt

=∫ ∞

0P(ε−1|X1|> t

)dt = ε−1E|X1| <∞.

Page 393: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 377

The convergence follows from the Borel–Cantelli lemma.Now, let (Xn) be an iid sequence of random variables with density f (x) = x−2 1lx≥1.

For all K > 0, we have

∞∑n=1

P(n−1Xn >K) =∞∑n=1

1

nK= +∞.

The events {n−1Xn >K} being independent, we can use the counterpart of the Borel–Cantellilemma: the event {n−1Xn >K for an infinite number of n} has probability 1. Thus, withprobability 1, the sequence (n−1Xn) does not tend to 0.

2.12 First note that the last r − 1 lines of BtA are the first r − 1 lines of A, for any matrix A

of appropriate size. The same property holds true when Bt is replaced by E(Bt). It followsthat the last r − 1 lines of E(BtA) are the last r − 1 lines of E(Bt )E(A). Moreover, it canbe shown, by induction on t , that the ith line i,t−i of Bt . . . B1 is a measurable function ofthe ηt−j , for j ≥ i. The first line of Bt+1Bt . . . B1 is thus of the form a1(ηt ) 1,t−1 + · · · +ar (ηt−r ) r,t−r . Since

E{a1(ηt ) 1,t−1 + · · · + ar(ηt−r ) r,t−r} = Ea1(ηt )E 1,t−1 + · · · + Ear(ηt−r )E r,t−r ,

the first line of EBt+1Bt . . . B1 is thus the product of the first line of EBt+1 and of EBt . . . B1.The conclusion follows.

2.13 .1. For any fixed t , the sequence(z(K)t

)K

converges a.s. (to zt) as K →∞. Thus

‖z(K)t− z(K−1)

t‖ → 0 a.s.

and the first convergence follows. Now note that we have

E‖z(K)t− z(K−1)

t‖s ≤ E

(‖z(K)t‖ + ‖z(K−1)

t‖)s

≤ E‖z(K)t‖s + E‖z(K−1)

t‖s <∞.

The first inequality uses (a + b)s ≤ as + bs for a, b ≥ 0 and s ∈ (0, 1]. The secondinequality is a consequence of Eε2s

t <∞. The second convergence then follows fromthe dominated convergence theorem.

2. We have z(K)t − z(K−1)t = AtAt−1 . . . At−K+1bt−K . The convergence follows from the pre-

vious question, and from the strict stationarity, for any fixed integer K , of the sequence{z(K)t − z

(K−1)t , t ∈ Z

}.

3. We have

‖XnY‖s =∑

i,j

|Xn,ijYj |s

≥ |Xn,i′j ′Yj ′ |s

for any i ′ = 1, . . . , , j ′ = 1, . . . , m. In view of the independence between Xn and Y ,it follows that E|Xn,i ′j ′Yj ′ |s = E|Xn,i ′j ′ |sE|Yj ′ |s → 0 a.s. as n→∞. Since E|Yj ′ |s is astrictly positive number, we obtain E|Xn,i ′j ′ |s → 0 a.s., for all i ′, j ′. Using (a + b)s ≤as + bs once again, it follows that

E‖Xn‖s = E

∑i,j

|Xn,ij |s

≤∑i,j

E|Xn,ij |s → 0.

Page 394: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

378 GARCH MODELS

4. Note that the previous question does not allow us to affirm that the convergence to 0 ofE(‖AkAk−1 . . . A1b0‖s) entails that of E(‖AkAk−1 . . . A1‖s), because b0 has zero compo-nents. For k large enough, however, we have

E(‖AkAk−1 . . . A1b0‖s ) = E(‖AkAk−1 . . . AN+1Y‖s)

where Y = AN . . . A1b0 is independent of AkAk−1 . . . AN+1. The general term ai,j ofAN . . . A1 is the (i, j)th term of the matrix AN multiplied by a product of η2

t vari-ables. The assumption AN > 0 entails ai,j > 0 a.s. for all i and j . It follows that theith component of Y satisfies Yi > 0 a.s. for all i. Thus EY s

i > 0. Now the previousquestion allows to affirm that E(‖AkAk−1 . . . AN+1‖s)→ 0 and, by strict stationarity,that E(‖Ak−NAk−N−1 . . . A1‖s )→ 0 as k→∞. It follows that there exists k0 such thatE(‖Ak0Ak0−1 . . . A1‖s) < 1.

5. If α1 or β1 is strictly positive, the elements of the first two lines of the vector A2b are alsostrictly positive, together with those of the (q + 1)th and (q + 2)th lines. By induction, itcan be shown that Amax(p,q)b0 > 0 under this assumption.

6. The condition ANb0 > 0 can be satisfied when α1 = β1 = 0. It suffices to consider anARCH(3) process with α1 = 0, α2 > 0, α3 > 0, and to check that A4b0 > 0.

2.14 In the case p = 1, the condition on the roots of 1− β1z implies |β| < 1. The positivityconditions on the φi yield

φ0 = ω/(1− β)> 0,

φ1 = α1 ≥ 0,

φ2 = β1α1 + α2 ≥ 0,

φq−1 = βq−11 α1 + β

q−21 α2 + · · · + β1αq−1 + αq ≥ 0,

φk = βk−q+11 φq−1 ≥ 0, k ≥ q.

The last inequalities imply β1 ≥ 0. Finally, the positivity constraints are

ω> 0, 0 ≤ β1 < 1,k+1∑i=1

αiβk+1−i1 , k = 0, . . . , q − 1.

If q = 2, these constraints reduce to

ω> 0, 0 ≤ β1 < 1, α1 ≥ 0, β1α1 + α2 ≥ 0.

Thus, we can have α2 < 0.

2.15 Using the ARCH(q) representation of the process (ε2t ), together with Proposition 2.2, we

obtain

ρε2(i) = α1ρε2(i − 1)+ · · · + αi−1ρε2(1)+ αi + αi+1ρε2 (1)+ · · · + αqρε2(q − i) ≥ αi .

2.16 Since ρε2(h) = α1ρε2(h− 1)+ α2ρε2 (h− 2), h> 0, we have ρε2 (h) = λrh1 + µrh2 whereλ, µ are constants and r1, r2 satisfy r1 + r2 = α1, r1r2 = −α2. It can be assumed that r2 < 0and r1 > 0, for instance. A simple computation shows that, for all h ≥ 0,

ρε2(2h+ 1) < ρε2 (2h) ⇐⇒ µ(r2 − 1)r2h2 < λ(1− r1)r

2h1 .

Page 395: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 379

If the last equality is true, it remains true when h is replaced by h+ 1 because r22 < r2

1 . Sinceρε2(1) < ρε2(0), it follows that ρε2 (2h+ 1) < ρε2 (2h) for all h ≥ 0. Moreover,

ρε2(2h)>ρε2(2h− 1) ⇐⇒ µ(r2 − 1)r2h−12 >λ(1− r1)r

2h−11 .

Since r22 < r2

1 , if ρε2 (2) < ρε2(1) then we have, for all h ≥ 1, ρε2(2h) < ρε2(2h− 1). Wehave thus shown that the sequence ρε2 (h) is decreasing when ρε2(2) < ρε2(1). If ρε2(2) ≥ρε2(1)> 0, it can be seen that for h large enough, say h ≥ h0, we have ρε2(2h) < ρε2 (2h− 1),again because of r2

2 < r21 . Thus, the sequence

{ρε2 (h)

}h≥h0

is decreasing.

2.17 Since Xn + Yn →−∞ in probability, for all K we have

P (Xn + Yn < K)

≤ P (Xn < K/2 or Yn < K/2)

= P (Yn < K/2)+ P (Xn < K/2) {1− P (Yn < K/2)} → 1.

Since Xn → −∞ in probability, there exist K0 ∈ R and n0 ∈ N such that P (Xn < K0/2) ≤ς < 1 for all n ≥ n0. Consequently,

P (Yn < K/2)+ ς {1− P (Yn < K/2)} = 1+ (ς − 1)P (Yn ≥ K/2)→ 1

as n→∞, for all K ≤ K0, which entails the result.

2.18 We have

[a(ηt−1) . . . a(ηt−n)ω(ηt−n−1)]1/n

= exp

[1

n

n∑i=1

log{a(ηt−i )} + ω(ηt−n−1)

]→ eγ a.s.

as n→∞. If γ < 0, the Cauchy rule entails that

ht = ω(ηt−1)+∞∑i=1

a(ηt−1) . . . a(ηt−i )ω(ηt−i−1),

converges almost surely, and the process (εt ), defined by εt =√htηt , is a strictly stationary

solution of (2.7). As in the proof of Theorem 2.1, it can be shown that this solution is unique,nonanticipative and ergodic. The converse is proved by contradiction, assuming that thereexists a strictly stationary solution (εt , σ

2t ). For all n> 0, we have

σ 20 ≥ ω(η−1)+

n∑i=1

a(η−1) . . . a(η−i )ω(η−i−1).

It follows that a(η−1) . . . a(η−n)ω(η−n−1) converges to zero, a.s., as n→∞, or, equivalently,that

n∑i=1

log a(ηi)+ logω(η−n−1)→−∞ a.s. as n→∞. (C.2)

We first assume that E log{a(ηt )}> 0. Then the strong law of large numbers entails∑ni=1 log a(ηi)→+∞ a.s. For (C.2) to hold true, it is then necessary that logω(η−n−1)→

−∞ a.s., which is precluded since (ηt ) is iid and ω(η0)> 0 a.s. Assume now thatE log{a(ηt )} = 0. By the Chung–Fuchs theorem, we have lim sup

∑ni=1 log a(ηi) = +∞ with

Page 396: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

380 GARCH MODELS

probability 1 and, using Exercise 2.17, the convergence (C.2) entails logω(η−n−1)→−∞in probability, which, as in the previous case, entails a contradiction.

2.19 Letting a(z) = λ+ (1− λ)z2, we have

σ 2t = a(ηt−1)σ

2t−1 = a(ηt−1) · · · a(η1)

{λσ 2

0 + (1− λ)σ 20 η

20

}.

Regardless of the value of σ 20 > 0, fixed or even random, we have almost surely

1

tlog σ 2

t =1

tlog{λσ 2

0 + (1− λ)σ 20 η

20

}+ 1

t

t−1∑k=1

log a(ηk)

→ E log a(ηk) < logEa(ηk) = 0

using the law of large numbers and Jensen’s inequality. It follows that σ 2t → 0 almost surely

as t →∞.

2.20 .1. Since the φi are positive and A1 = 1, we have φi ≤ 1, which shows the first inequality.The second inequality follows by convexity of x �→ x log x for x > 0.

2. Since A1 = 1 and Ap <∞, the function f is well defined for q ∈ [p, 1]. We have

f (q) = log∞∑i=1

φq

i + logE|η0|2q .

The function q �→ logE|η0|2q is convex on [p, 1] if, for all λ ∈ [0, 1] and all q, q∗ ∈[p, 1],

logE|η0|2λq+2(1−λ)q∗ ≤ λ logE|η0|2q + (1− λ) logE|η0|2q∗ ,

which is equivalent to showing that

E[XλY 1−λ] ≤ [EX]λ[EY ]1−λ,

with X = |η0|2q , Y = |η0|2q∗ . This inequality holds true by Holder’s inequality. The sameargument is used to show the convexity of q �→ log

∑∞i=1 φ

q

i . It follows that f is convex,as a sum of convex functions. We have f (1) = 0 and f (p) < 0, thus the left derivativeof f at 1 is negative, which gives the result.

3. Conversely, we assume that there exists p∗ ∈ (0, 1] such that Ap∗ <∞ and that (2.52) issatisfied. The convexity of f on [p∗, 1] and (2.52) imply that f (q) < 0 for q sufficientlyclose to 1. Thus (2.41) is satisfied. By convexity of f and since f (1) = 0, we havef (q) < 0 for all q ∈ [p, 1[. It follows that, by Theorem 2.6, E|εt |q <∞ for all q ∈ [0, 2[.

2.21 Since E(ε2t | εu, u < t) = σ 2

t , we have a = 0 and b = 1. Using (2.60), we can easily see that

Var(σ 2t )

Var(ε2t )= α2

1

1− 2α1β1 − β21

<1

κη,

since the condition for the existence of Eε4t is 1− 2α1β1 − β2

1 >α21κη. Note that when the

GARCH effect is weak (that is, α1 is small), the part of the variance that is explained bythis regression is small, which is not surprising. In all cases, the ratio of the variances isbounded by 1/κη, which is largely less than 1 for most distributions (1/3 for the Gaussiandistribution). Thus, it is not surprising to observe disappointing R2 values when estimatingsuch a regression on real series.

Page 397: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 381

Chapter 3

3.1 Given any initial measure, the sequence (Xt)t∈N clearly constitutes a Markov chain on(R,B(R)), with transition probabilities defined by P(x,B) = P(X1 ∈ B | X0 = x) = Pε(B −θx).

(a) Since Pε admits a positive density on R, the probability measure P(x, .) is, for allx ∈ E, absolutely continuous with respect to λ and its density is positive on R. Thus anymeasure ϕ which is absolutely continuous with respect to λ is a measure of irreducibility:∀x ∈ E,

ϕ(B)> 0 ⇒ λ(B)> 0 ⇒ P(x,B)> 0.

Moreover λ is a maximal measure of irreducibility.(b) Assume, for example, that εt is uniformly distributed on [−1, 1]. If θ > 1 and X0 =

x0 > 1/(θ − 1), we have x0 < X1 < X2 < . . ., regardless of the εt . Thus there exists noirreducibility measure: such a measure should satisfy ϕ(]−∞, x]) = 0, for all x ∈ R, whichwould imply ϕ = 0.

3.2 If (Xn) is strictly stationary, X1 and X0 have the same distribution, µ, satisfying

∀B ∈ B, µ(B) = P(X0 ∈ B) = P(X1 ∈ B) =∫P(x, B)dµ(x).

Thus µ is an invariant probability measure.Conversely, suppose that µ is invariant. Using the Chapman–Kolmogorov relation, by

which ∀t ∈ N, ∀s, 0 ≤ s ≤ t, ∀x ∈ E, ∀B ∈ E,

P t (x, B) =∫y∈E

P s(x, dy)P t−s (y, B),

we obtain

µ(B) = P[X1 ∈ B] =∫x

[∫y

P (y, dx)µ(dy)

]P(x, B)

=∫y

µ(dy)

∫x

P (y, dx)P (x, B) =∫µ(dy)P 2(y, B) = P[X2 ∈ B].

Thus, by induction, for all t , P[Xt ∈ B] = µ(B) (∀B ∈ B). Using the Markov property,this is equivalent to the strict stationarity of the chain: the distribution of the process(Xt ,Xt+1, . . . , Xt+k) is independent of t , for any integer k.

3.3 We have

π(B) = limt→+∞

∫µ(dx)P t (x, B)

= limt→+∞

∫x

µ(dx)

∫y

P t−1(x, dy)P (y, B)

=∫y

P (y, B) limt→+∞

∫x

P t−1(x, dy)µ(dx)

=∫P(y, B)π(dy).

Thus π is invariant. The third equality is an immediate consequence of the Fubini andLebesgue theorems.

Page 398: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

382 GARCH MODELS

3.4 Assume, for instance, θ > 0. Let C = [−c, c], c > 0, and let δ = inf{f (x); x ∈[−(1+ θ)c, (1+ θ)c]}. We have, for all A ⊂ C and all x ∈ C,

P(x,A) =∫A

f (y − θx)dλ(y) ≥ δλ(A).

Now let B ∈ E. Then for all x ∈ C,

P 2(x, B) =∫E

P (x, dy)P (y, B)

≥∫C

P (x, dy)P (y, B)

≥ δ

∫C

λ(dy)P (y, B) := ν(B).

The measure ν is nontrivial since ν(E) = δλ(C) = 2δc> 0.

3.5 It is clear that (Xt) constitutes a Feller chain on R. The λ-irreducibility follows from theassumption that the noise has a density which is everywhere positive, as in Exercise 3.1. Inorder to apply Theorem 3.1, a natural choice of the test function is V (x) = 1+ |x|. We have

E(V (Xt | Xt−1 = x)) ≤ 1+ E(|θ + bεt |)|x| + E(|εt |):= 1+K1|x| +K2 = K1g(x)+K2 + 1−K1.

Thus if K1 < 1, we have, for K1 < K < 1 and for g(x)>(K2 + 1−K1)/(K −K1),

E(V (Xt | Xt−1 = x)) ≤ Kg(x).

If we put A = {x; g(x) = 1+ |x| ≤ (K2 + 1−K1)/(K −K1)}, the set A is compact and theconditions of Theorem 3.1 are satisfied, with 1− δ = K.

3.6 By summing the first n inequalities of (3.11) we obtain

n−1∑t=0

Pt+1V (x0) ≤ (1− δ)

n−1∑t=0

PtV (x0)+ b

n−1∑t=0

P t(x0, A).

It follows that

b

n−1∑t=0

P t (x0, A) ≥ δ

n−1∑t=1

Pt V (x0)+ PnV (x0)− (1− δ)V (x0)

≥ (n− 1)δ + 1− (1− δ)M,

because V ≥ 1. Thus, there exists κ > 0 such that

Qn(x0, A) = 1

n

n∑t=1

P t (x0, A) ≥ n− 1

nbδ + 1− (1− δ)M

nb− 1

n>κ.

Note that the positivity of δ is crucial for the conclusion.

Page 399: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 383

3.7 We have, for any positive continuous function f with compact support,∫E

f (y)π(dy) = limk→∞

∫E

f (y)Qnk (x0, dy)

= limk→∞

{∫E

Pf (y)Qnk (x0, dy)

+ 1

nk

∫E

f (y)(P(x0, dy)− Pnk+1(x0, dy)

)}= lim

k→∞

∫E

Pf (y)Qnk (x0, dy)

≥∫E

Pf (y)π(dy).

The inequality is justified by (i) and the fact that Pf is a continuous positive function. Itfollows that for f = 1lC , where C is a compact set, we obtain

π(C) ≥∫E

P (y, C)π(dy)

which shows that,

∀B ∈ E, π(B) ≥∫E

P (y,B)π(dy)

(that is, π is subinvariant) using (ii). If there existed B such that the previous inequality werestrict, we should have

π(E) = π(B)+ π(Bc)

>

∫E

P (y, B)π(dy)+∫E

P (y, Bc)π(dy) =∫E

P (y, E)π(dy) = π(E),

and since π(E) <∞ we arrive at a contradiction. Thus

∀B ∈ E, π(B) =∫E

P (y, B)π(dy),

which signifies that π is invariant.

3.8 See Francq and Zakoıan (2006a).

3.9 If supn nun were infinite then, for any K > 0, there would exist a subscript n0 such thatn0un0 >K . Then, using the decrease in the sequence, one would have

∑n0k=1 uk ≥ n0un0 >K .

Since this should be true for all K > 0, the sequence would not converge. This applies directlyto the proof of Corollary A.3 with un = {αX(n)}ν/(2+ν), which is indeed a decreasing sequencein view of point (v) on page 348.

3.10 We have

d4 =k−1∑ =1

|Cov (XtXt+h,Xt− Xt+k− )| ≤ d7 + d8,

Page 400: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

384 GARCH MODELS

where

d7 =k−1∑ =1

|Cov (Xt− ,XtXt+hXt+k− )| ,

d8 =k−1∑ =1

|EXtXt+hEXt− Xt+k− | .

Inequality (A.8) shows that d7 is bounded by

K

∞∑ =0

{αX( )}ν/(2+ν).

By an argument used to deal with d6, we obtain

d8 ≤ K supk≥1

(k − 1){αX(k)}ν/(2+ν) <∞,

and the conclusion follows.

3.11 The chain satisfies the Feller condition (i) because

E {g(Xt) | Xt−1 = x} =9∑

k=1

g

(x

10+ k

10

)1

9

is continuous at x when g is continuous.To show that the irreducibility condition (ii) is not satisfied, consider the set of numbers

in [0, 1] such that the sequence of decimals is periodic after a certain lag:

B = {x = 0, u1u2 . . . such that ∃n ≥ 1, (ut )t≥n periodic}.

For all h ≥ 0, Xt ∈ B if and only if Xt+h ∈ B. We thus have,

∀t, P t (Xt ∈ B | X0 = x) = 0 for x /∈ B

and,

∀t, P t (Xt ∈ B | X0 = x) = 0 for x ∈ B.

This shows that there is no nontrivial irreducibility measure.The drift condition (iii) is satisfied with, for instance, a measure φ such that

φ ([−1, 1])> 0, the energy V (x) = 1+ |x| and the compact set A = [−1, 1]. Indeed,

E (|Xt | + 1 | Xt−1 = x) = E(∣∣∣ x

10+ ut

10

∣∣∣+ 1)≤ 3

2+ |x|

10≤ (1− δ)(1+ |x|)

provided

0 < δ ≤910 |x| − 1

2

1+ |x| .

Page 401: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 385

Chapter 4

4.1 We have ε2t = ωt + αt ε

2t−1, where ωt = ωη2

t and αt = αη2t . It follows that

ε2mt = η2

mt (ω + αωmt−1 + · · · + ααmt−1 . . . αmt−m+2ωmt−m+1

+ ααmt−1 . . . αmt−m+1ε2m(t−1))

:= η2mt (ωt + αt ε

2m(t−1)).

Since (εt ) is supposed to be a nonanticipative solution of the model, εm(t−1) is independentof ηmt , . . . ηmt−m+1. It follows that, using the independence of the process (ηt ),{

E(εmt |εm(t−1), εm(t−2), . . .) = 0Var(εmt |εm(t−1), εm(t−2), . . .) = ω(1+ α + · · · + αm−1)+ αmε2

m(t−1).

Thus, the process (εmt ) is a semi-strong ARCH(1).Let

ηt = εmt

{Var(εmt |εm(t−1), εm(t−2), . . .)}1/2

= ηmt

(ωt + αt ε

2m(t−1)

ω(1+ α + · · · + αm−1)+ αmε2m(t−1)

)1/2

= ηmt

(1+ ω∗t + α∗t ε2

m(t−1)

ω(1+ α + · · · + αm−1)+ αmε2m(t−1)

)1/2

,

where ω∗t = ωt − ω(1+ α + · · · + αm−1) and α∗t = αt − αm. As in the case m = 2, we showthat (εmt ) is not a strong ARCH(1) process by showing that (ηt ) is not iid. We assume that(ηt ) is iid and give a proof by contradiction. Note that

ε2m(t−1)

{αm(η2

t − η2mt)− α∗t η

2mt

} = η2mtω

∗t − (η2

t − η2mt )ω(1+ α + · · · + αm−1).

Since the random variable ε2m(t−1) is nondegenerate (assuming that η2

m(t−1) is nondegenerate)and is independent of (ηt , ηmt , ω∗t , α∗t ) (the vector (ω∗t , α∗t ) being a measurable function of{ηmt , ηmt−1, . . . , ηmt−m+1}), we have (see Exercise 11.3)

αm(η2t − η2

mt )− α∗t η2mt = 0

andη2mtω

∗t − (η2

t − η2mt)ω(1+ α + · · · + αm−1) = 0

with probability 1. When α = 0, elementary calculations show that we then have

1+ αη2mt−1 + · · · + αm−1η2

mt−1 · · · η2mt−m+1 = η2

mt−1 · · · η2mt−m+1.

In view of Exercise 11.3 this is impossible, because the law of η2mt−m+1 is nondegenerate.

4.2 By Theorem 4.2, we know that (εmt ) admits a weak GARCH(1, 1) representation. We nowgive a direct proof and compute the coefficients a(m), b(m) and c(m) of this representation.Let

ε2t − aε2

t−1 = c + νt − b1νt−1 − b2νt−2

Page 402: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

386 GARCH MODELS

be the GARCH(1, 2) representation of (εt ). We have

ε2t − amε2

t−m = (1+ aL+ · · · + am−1Lm−1)(1− aL)ε2t

= c(1+ · · · + am−1)+ (1+ · · · + am−1Lm−1)(1− b1L− b2L2)νt

= c(1+ · · · + am−1)+ vt .

The variable vt being function of (νt , νt−1, . . . , νt−m−1), where (νt ) is a noise, the process vmtis an MA(1) process of the form vmt = ν(m),t − b(m)ν(m),t−1. We obtain b(m) as the solution,of modulus less than 1, of the equation

b(m)

1+ b2(m)

= −Cov(vt , vt−m)Var(vt )

= am−2(b1a + b2 + ab2(a − b1))(1− a2)

(1− a2)(1+ b2a2(m−1))+ (a − b)2(1− a2(m−1))+ b2δm,

where δm = 2a(a − b1)− 2a2(m−1)(1+ b1)+ b2(1+ a2(m−2)). We retrieve the resultobtained for the GARCH(1, 1) process when b2 = 0.

4.3 Using the independence between Wt and εt−h for h> 0, and then the independence betweenthe process (Wt) and (et ), we have

Cov(ε2t , ε

2t−h) = Cov(e2

t , ε2t−h)+ Cov(W 2

t , ε2t−h)+ 2Cov(etWt , ε

2t−h)

= Cov(e2t , e

2t−h)+ Cov(e2

t ,W2t−h)+ 2Cov(e2

t , et−hWt−h)

= Cov(e2t , e

2t−h).

Except for the variance, (εt ) and (et ) thus have the same autocovariance structure. Since (et )is a strong GARCH(p, q), the autocovariance structure of its square is determined by

Cov(e2t , e

2t−h) =

max{p,q}∑i=1

(ai + bi)Cov(e2t , e

2t−h+i ), h>p.

The same relation holds true for (ε2t ) whenever h>max{p, q} (since the last term in the

sum is Cov(e2t , e

2t−h+max{p,q}), which cannot be replaced by Cov(ε2

t , ε2t−h+max{p,q}), unless

h>max{p, q}). The ARMA representation of (ε2t ) follows.

4.4 It suffices to verify that

E(vt ) = 0 and E(vtvt−k) = 0, ∀k >max{p, q}.The first equality follows from the fact that (εt ) and (ut ) are noises. For k >max{p, q}, wehave

E(vtvt−k) = 2cq∑i=1

aiE(εt−ivt−k)+∑i =j

aiajE(εt−iεt−j vt−k)

+E(utvt−k)−p∑i=1

biE(ut−ivt−k)

The first two sums are null because (εt ) is a martingale difference. The last two sums arenull because ηt is uncorrelated with any variable belonging to the σ -field generated by thepast of εt .

Page 403: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 387

4.5 .1. The relation follows from the independence between Zt and the vt−j , j = 0.

2. Expanding the equation for σ 2t , we have

σ 2t−1 = A(vt−1)+

h−1∑i=2

B(vt−1) . . . B(vt−i+1)A(vt−i )+ B(vt−1) . . . B(vt−h+1)σ2t−h.

It follows that

Cov(B(vt )σ2t−1, A(vt−h)) = E{B(vt )}Cov(B(vt−1) . . . B(vt−h+1)σ

2t−h, A(vt−h))

= [E{B(vt )}]hCov(σ 2t−h,A(vt−h))

= [E{B(vt )}]hCov(A(vt−h)+ B(vt−h)σ 2t−h−1, A(vt−h))

= [E{B(vt )}]h[Var(A(vt ))+ Cov(A(vt ), B(vt ))E(σ2t )].

Similarly,

Cov(B(vt )σ2t−1, B(vt−h)σ

2t−h−1)

= [E{B(vt )}]hCov(σ 2t−h, B(vt−h)σ

2t−h−1)

= [E{B(vt )}]hCov(A(vt−h)+ B(vt−h)σ 2t−h−1, B(vt−h)σ

2t−h−1)

= [E{B(vt )}]h[Var(B(vt ))(Eσ2t )

2 + E(B(vt )2)Var(σ 2

t )

+ Cov(A(vt ), B(vt ))Eσ2t ].

The conclusion follows.

3. Since the constraint d2 + b2 < 1 entails the second-order stationarity of (σ 2t ), we have

E(σ 2t ) =

c

1− dand Var(σ 2

t ) =[a(1− d)+ bc]2

(1− e)2(1− d2 − b2).

It follows that

γε2 (h) := Cov(ε2t , ε

2t−h) = dh

[a(1− d)+ bc]2

(1− d)2(1− d2 − b2), ∀h> 0.

4. Relation (4.11) follows from the fact that γε2(h) = dγε2(h− 1) for h> 1. We obtain β

from the lag 1 autocorrelation of the process (ε2t ), solving the quadratic equation

(d + β)(1+ eβ)

1+ 2eβ + β2= e[a(1− e)+ bc]2

E(Z4t )([a(1− e)+ bc]2 + c2(1− e2 − b2))

.

4.6 The chain being iid, the rows of its transition probability matrix are equal. It follows that thematrix is of rank 1. The unique nonzero eigenvalue is thus the trace, which is equal to 1. Inthis case, all the λk in (4.9) are equal to zero, and we have(

I −max{p,q}∑

i=1

(ai + bi)Li

)ε2t = ω +

I + p+K−1∑i=1

βiLi

ut .

Note that the AR part is the same as when ω is constant.

4.7 .1. Eε2t =

∑2i=1 ωiπ(i).

2. The matrix P of the transition probabilities admits the eigenvalues 1 and λ. Note that−1 < λ < 1 by the irreducibility and aperiodicity assumptions. Diagonalizing P , it is

Page 404: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

388 GARCH MODELS

easy to see that the elements of P k are of the form p(k)(i, j) = a1(i, j)+ a2(i, j)λk ,

for k ≥ 0. Taking the limit as k tends to infinity, we obtain a1(i, j) = π(j), and usingthe value k = 0, we obtain a1(i, j)+ a2(i, j) = 1l{i=j} . It follows that for j = 1, 2and i = j ,

p(k)(i, j) = π(j)(1− λk), p(k)(j, j) = π(j)+ λk(1− π(j)),

and (4.15) follows.

3. Using the independence between the process (ηt ) and (�t ), for k > 0 we have

Cov(ε2t , ε

2t−k) = Cov

{σ 2(�t ), σ

2(�t−k)}

= E{σ 2(�t )σ

2(�t−k)}− {Eσ 2(�t )}2

=2∑

i,j=1

p(k)(i, j)π(i)ωiωj −{

2∑i

π(i)ωi

}2

=2∑

i,j=1

{p(k)(i, j)− π(j)}π(i)ωiωj . (C.3)

Using (4.15), we then have

Cov(ε2t , ε

2t−k) = λk

2∑

j=1

(1− π(j))π(j)σ 4(j)−∑i =j

π(i)π(j)ωiωj

= λk{ω1 − ω2}2π(1)π(2), k > 0. (C.4)

4. Similar calculations show that

Var(ε2t ) = {ω1 − ω2}2π(1)π(2)+ {ω2

1π(1)+ ω22π(2)}Var(η2

t ). (C.5)

5. In view of (C.4), we have Cov(ε2t , ε

2t−k) = λCov(ε2

t , ε2t−k) for k > 1. By (C.5), this relation

is generally not true for k = 1. It follows that ε2t satisfies an ARMA(1, 1) model of

autoregressive coefficient λ.

6. In this case λ = 0, thus (ε2t ) (up to its mean) is a white noise.

7. We obtainε2t − 0.6ε2

t−1 = 1+ ut − 0.427βut−1,

where (ut ) is a noise.

4.8 .1. We verify that (εt ) is a noise, using the independence of the sequence (ηt ) and theexistence of µ4 = Eη4

t . For k > 1, we have Eη2t η

2t−1η

2t−kη

2t−k−1 = 1 and Eη2

t η4t−1η

2t−2 =

µ4. It follows that

Cov(ε2t , ε

2t−k) = 0, k > 1,

Cov(ε2t , ε

2t−1) = µ4 − 1.

Page 405: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 389

Thus (ε2t ) admits an MA(1) representation of the form ε2

t = 1+ ut − θut−1, where (ut )

is a noise and θ is a parameter which depends on µ4. It follows that (εt ) is a weakGARCH(0, 1) process.

2. The process (Xt) := (ε2t − 1) is a weak MA-GARCH if (u2

t ) has an ARMA representation.We have

u2t − θ2u2

t−1 = X2t + 2Xt

∞∑i=1

θ iXt−i := vt .

This equation determines the AR part of the ARMA model. Note that the existenceof E(η8

t ) implies that of E(u4t ), and thus that of E(v2

t ).

In order to show that (vt ) is an MA process, compute its autocovariance. We have

Cov(vt , vt−k) = Cov(X2t , X

2t−k + 2Xt−k

∞∑i=1

θ iXt−k−i )

+ Cov(2θXtXt−1, X2t−k + 2Xt−k

∞∑i=1

θ iXt−k−i )

+ Cov(2Xt

∞∑i=2

θiXt−i , X2t−k + 2Xt−k

∞∑i=1

θ iXt−k−i ).

Since Xt is function of (νt , νt−1), the first covariance on the right-hand side is equal to 0for all k > 1. For the same reason, XtXt−1 is function of (νt , νt−1, νt−2), thus the secondterm is null for all k > 2. Finally, the last covariance is null using E(Xt) = 0 and theindependence between Xt and Xt−k (for all k ≥ 2). We have thus shown that (ε2

t − 1) isa weak ARMA(0, 1)-GARCH(1, 2) process.

4.9 Assume that ε1t and ε2t are two independent weak GARCH processes. Without loss ofgenerality, it can be assumed that these two GARCH models have the same order (r, p)(adding null coefficients if necessary). The existence of an ARMA representation for (ε2

1t )

entails an autocovariance function of the form

γε21(h) =

∑i=1

Pi(h)φhi , ∀h ≥ p,

where the φi (1 ≤ i ≤ ≤ r) are the distinct complex roots of the AR polynomial and thePi are polynomials, the degree of Pi being equal to the order of multiplicity ri of the rootφi . Similarly, we have

γε22(h) =

m∑i=1

Qi(h)ψhi , ∀h ≥ p,

with analogous notation. Using (4.13) with εt = ε1t + ε2t , we obtain

γε2(h) = ∑i=1

Pi(h)φhi +

m∑i=1

Qi(h)ψhi , ∀h ≥ p.

It follows that ε2t is an ARMA process. The roots of the AR polynomial are the φi and ψi .

Thus, if these roots are distinct, the AR polynomial of the aggregated process is obtained bymultiplying the AR polynomials of the processes ε2

it .

Page 406: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

390 GARCH MODELS

Chapter 5

5.1 Let (Ft ) be an increasing sequence of σ -fields such that εt ∈ Ft and E(εt |Ft−1) = 0. Forh> 0, we have εt εt+h ∈ Ft+h and

E(εtεt+h|Ft+h−1) = εtE(εt+h|Ft+h−1) = 0.

The sequence (εt εt+h,Ft+h)t is thus a stationary sequence of square integrable martingaleincrements. We thus have

n1/2γ (h)L→ N(0, Eε2

t ε2t+h),

where γ (h) = n−1∑nt=1 εt εt+h. To conclude,1 it suffices to note that

n1/2γ (h)− n1/2γ (h) = n−1/2n∑

t=n−h+1

εtεt+h → 0

in probability (and even in L2).

5.2 This process is a stationary martingale difference, whose variance is

γ (0) = Eε2t =

ω

1− α.

Its fourth-order moment is

Eε4t = µ4

(ω2 + α2Eε4

t + 2αωEε2t

).

Thus,

Eε4t =

µ4(ω2 + 2αωEε2

t )

1− µ4α2= µ4ω

2(1+ α)

(1− α)(1− µ4α2).

Moreover,

Eε2t ε

2t−1 = E(ω + αε2

t−1)ε2t−1 =

ω2

1− α+ αEε4

t .

Using Exercise 5.1, we thus obtain

n1/2γ (1)L→ N

{0,

ω2(1+ αµ4)

(1− α)(1− µ4α2)

}.

5.3 We have

n1/2ρ(1) = n1/2γ (1)

γ (0).

By the ergodic theorem, the denominator converges in probability (and even a.s.) toγε(0) = ω/(1− α) = 0. In view of Exercise 5.2, the numerator converges in law toN{

0, ω2(1+αµ4)

(1−α)(1−µ4α2)

}. Cramer’s theorem2 then entails

n1/2ρ(1)L→ N

{0,(1− α)(1+ αµ4)

(1− µ4α2)

}.

The asymptotic variance is equal to 1 when α = 0 (that is, when εt is a strong white noise).Figure C.3 shows that the asymptotic distribution of the empirical autocorrelations of aGARCH can be very different from those of a strong white noise.

1 If Xn → x, x constant, and YnL→ Y , then Xn + Yn

L→ x + Y .2 If Yn

L→ Y and Tn → t in probability, t constant, then TnYnL→ Y t .

Page 407: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 391

0.1 0.2 0.3 0.4 0.5

1

a

2

3

4

5

6

Figure C.3 Comparison between the asymptotic variance of√nρ(1) for the ARCH(1) process

(5.28) with (ηt ) Gaussian (solid line) and the asymptotic variance of√nρ(1) when εt is a strong

white noise (dashed line).

5.4 Using Exercise 2.8, we obtain

Eε2t ε

2t+h = γε2 (h)+ γ 2(0).

In view of Exercises 5.1 and 5.3,

n1/2ρ(h)L→ N {0, Eε2

t ε2t+h/γ

2(0)}

for any h = 0.

5.5 Let Ft be the σ -field generated by {ηu, u ≤ t}. If s + 2> t + 1, then

Eεtεt+1εsεs+2 = E {E(εt εt+1εsεs+2|Fs+1)} = 0.

Similarly, Eεtεt+1εsεs+2 = 0 when t + 1>s + 2. When t + 1 = s + 2, we have

Eεtεt+1εsεs+2 = Eεt−1εt ε2t+1 = Eεt−1εt (ω + αε2

t + βσ 2t )η

2t+1

= Eεt−1σtηt (ω + ασ 2t η

2t + βσ 2

t ) = 0,

because εt−1σt ∈ Ft−1, εt−1σ3t ∈ Ft−1, E(ηt |Ft−1) = Eηt = 0 and E(η3

t |Ft−1) = Eη3t = 0.

Using (7.24), the result can be extended to show that Eεtεt+hεsεs+k = 0 when k = h and(εt ) follows a GARCH(p, q), with a symmetric distribution for ηt .

5.6 Since Eεtεt+1 = 0, we have Cov {εt εt+1, εsεs+2} = Eεtεt+1εsεs+2 = 0 in view ofExercise 5.5. Thus

Cov{n1/2ρ(1), n1/2ρ(2)

} = n−1n−1∑t=1

n−2∑s=1

Cov {εt εt+1, εsεs+2} = 0.

5.7 In view of Exercise 2.8, we have

γε(0) = ω

1+ α + β,

γε2(0) = 2ω2(1+ α + β)

(1− α − β)(1− 3α2 − β2 − 2αβ)

(1+ α2

1− (α + β)2

),

ρε2(1) = α(1− β2 − αβ

)1− β2 − 2αβ

, ρε2(h) = (α + β)ρε2(h− 1), ∀h> 1.

Page 408: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

392 GARCH MODELS

with ω = 1, α = 0.3 and β = 0.55. Thus γε(0) = 6.667, γε2 (0) = 335.043, ρε2(1) =0.434694. Thus

Eε2t ε

2t+i

γ 2ε (0)

= 0.85i−1γε2 (0)ρε2(1)+ γ 2ε (0)

γ 2ε (0)

for i = 1, . . . , 5. Finally, using Theorem 5.1,

limn→∞Var

√nρ5 =

4.27692 0 0 0 0

0 3.78538 0 0 00 0 3.36758 0 00 0 0 3.01244 00 0 0 0 2.71057

.

5.8 Since γX( ) = 0, for all | |>q, we clearly have

vi,i =+q∑

=−qρ2X( ).

Since ρε2 ( ) = α| |, γε(0) = ω/(1− α) and

γε2(0) = 2ω2

(1− α)2(1− 3α2)

(see, for instance, Exercise 2.8), we have

v∗i,i =2

1− 3α2

α| |ρX( + i) {ρX( + i)+ ρX( − i)}

= 2

1− 3α2

q∑ =−q

αi− ρ2X( ).

Note that v∗i,i → 0 as i →∞.

5.9 Conditionally on initial values, the score vector is given by

∂�n(θ)

∂θ= −1

2

n∑t=1

∂θ

{ε2t (θ)

σ 2+ log σ 2

}

=(

1σ 2

∑nt=1 εt (θ)

∂Fθ (Wt )

∂β+ 1

2

(ε2t (θ)

σ 2 − 1)∂σ 2

∂β

1σ 2

∑nt=1 εt (θ)

∂Fθ (Wt )∂ψ

)

where εt (θ) = Yt − Fθ(Wt ). We thus have

I22 = 1

σ 2E

{∂Fθ0 (Wt )

∂ψ

}{∂Fθ0(Wt )

∂ψ ′

},

and, when σ 2 does not depend on θ ,

I11 = 1

σ 2

{∂Fθ0(Wt )

∂β

}{∂Fθ0 (Wt )

∂β ′

}, I12 = 1

σ 2E

{∂Fθ0 (Wt )

∂β

}{∂Fθ0 (Wt )

∂ψ ′

}.

Page 409: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 393

5.10 In the notation of Section 5.4.1 and denoting by β = (β ′1, β′2)′ the parameter of interest, the

log-likelihood is equal to

�n(β) = − 1

2σ 2‖Y −X1β1 −X2β2‖2,

up to a constant. The constrained estimator is βc = (βc′

1 , 0′)′, with βc1 = (X′1X1)−1X′1Y . The

constrained score and the Lagrange multiplier are related by

∂β�n(β

c) =(

), λ = 1

σ 2X′2U

c, U c = Y −X1βc1 .

On the other hand, the exact laws of the estimators under H0 are given by

√n(β − β) ∼ N (0, I−1) , I = 1

nσ 2X′X, X = (X1, X2)

and1√nλ ∼ N {0, (I 22)−1} , I 22 =

(I22 − I21I

−111 I12

)−1

with

Iij = 1

nσ 2X′iXj .

For the case X′1X2 = 0, we can estimate I 22 by

I 22 = I−122 , I22 = 1

nσ 2cX′2X2, nσ 2c = ‖U c‖2.

The test statistic is then equal to

LMn = λ′I 22λ = nUc′X2

(X2X

′2

)−1X′2U

c

U c′ U c= n

σ 2c − σ 2

σ 2c= n(1− R2), (C.6)

withnσ 2c = ‖U c‖2, U c = Y −Xβ,

and where R2 is the coefficient of determination (centered if X1 admits a constant column)in the regression of U c on the columns of X2. For the first equality of (C.6), we use the factthat in a regression model of the form Y = Xβ + U , with obvious notation, Pythagoras’stheorem yields

Y ′X(X′X)−1X′Y = ‖Y‖2 = ‖Y‖2 − ‖U‖2.

In the general case, we have

LMn = nU c′X2

{X2X

′2 −X2X

′1(X1X

′1)−1X1X

′2

}−1X′2U

c

U c′ U c= n

σ 2c − σ 2

σ 2c.

Since the residuals of the regression of Y on the columns of X1 and X2 are also the residualsof the regression of U c on the columns of X1 and X2, we obtain LMn by:

1. computing the residuals U c of the regression of Y on the columns of X1;

2. regressing U c on the columns of X2 and X1, and setting LMn = nR2, where R2 is thecoefficient of determination of this second regression.

Page 410: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

394 GARCH MODELS

5.11 Since y = 0, it is clear that R2nc = R2, with R2

nc and R2 defined in (5.29) and (5.30).Since T is invertible, we have Col(X) = Col(X), where Col(Z) denotes the vectorial

subspace generated by the columns of the matrix Z, and

PX = XT (T ′X′XT )−1T ′X′ = PX.

If e ∈ Col(X) then e ∈ Col(X) and

PXY = cPXY + dPXe = cY + de.

Noting that y := n−1∑nt=1 Yt = cy + d = d , we conclude that

‖PXY − ye‖2

‖Y − ye‖2= c2‖Y‖2

c2‖Y‖2= R2

nc.

5.12 Figure C.4 and Tables C.1–C.6 lead us to select the conditionally heteroscedastic weak whitenoise model for the S&P 500 and the DAX, but one can also try several ARMA models onthe S&P 500 (see Table C.7). From Table C.8, a GARCH(1, 1) seems plausible for the S&P500. For the DAX index, we can envisage the GARCH(2, 1) and GARCH(2, 2) models inparticular (see Table C.6).

0 5 10 15 20 25 30 35

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

Aut

ocor

rela

tions

SP Returns

0 5 10 15 20 25 30 35

Aut

ocor

rela

tions

DAX Returns

0 5 10 15 20 25 30 35

Aut

ocor

rela

tions

Squared SP Returns

0 5 10 15 20 25 30 35

Aut

ocor

rela

tions

Squared DAX Returns

Figure C.4 Correlograms of the returns and squares of the returns of the S&P 500 index fromMarch 2, 1990 to December 29, 2006 and of the DAX index from November 27, 1990 to April 4,2007.

Page 411: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 395

Table C.1 Portmanteau tests on the squares of the DAX and S&P 500 returns.

Tests for noncorrelation of the squared DAX returns

m 1 2 3 4 5 6ρε2(m) 0.185 0.284 0.246 0.241 0.239 0.279σρ

ε2 (m) 0.031 0.031 0.031 0.031 0.031 0.031QLBm 141.144 475.003 725.039 964.026 1201.023 1523.802

p-value 0.000 0.000 0.000 0.000 0.000 0.000

m 7 8 9 10 11 12ρε2(m) 0.230 0.270 0.186 0.214 0.213 0.215σρ

ε2 (m) 0.031 0.031 0.031 0.031 0.031 0.031QLBm 1742.534 2043.226 2186.525 2376.318 2564.566 2755.603

p-value 0.000 0.000 0.000 0.000 0.000 0.000

Tests for noncorrelation of the squared S&P 500 returns

m 1 2 3 4 5 6ρε2(m) 0.204 0.200 0.192 0.148 0.203 0.141σρ

ε2 (m) 0.030 0.030 0.030 0.030 0.030 0.030QLBm 177.035 346.639 503.346 595.875 771.254 855.922

p-value 0.000 0.000 0.000 0.000 0.000 0.000

m 7 8 9 10 11 12ρε2(m) 0.160 0.151 0.115 0.148 0.141 0.135σρ

ε2 (m) 0.030 0.030 0.030 0.030 0.030 0.030QLBm 964.803 1061.963 1118.258 1211.899 1296.512 1374.324

p-value 0.000 0.000 0.000 0.000 0.000 0.000

Table C.2 LM tests for conditional homoscedasticity of the DAX and S&P 500 returns.

Tests for absence of ARCH effect for the DAX

m 1 2 3 4 5 6 7 8 9LMn 141.0 408.3 524.3 590.0 640.5 723.4 746.9 789.0 789.3p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Tests for absence of ARCH effect for the S&P 500

m 1 2 3 4 5 6 7 8 9LMn 176.9 287.7 358.0 376.7 442.0 449.8 467.8 478.6 479.6p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Page 412: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

396 GARCH MODELS

Table C.3 Portmanteau tests on the series of DAX returns from November 27, 1990 to April 4,2007.

Tests for GARCH noise based on Qm

m 1 2 3 4 5 6 7 8ρ(m) −0.006 −0.007 −0.040 −0.036 −0.019 −0.042 −0.016 −0.031σρ(m) 0.044 0.050 0.047 0.047 0.047 0.049 0.047 0.049Qm 0.071 0.148 2.933 5.140 5.777 8.532 8.998 10.578p-value 0.790 0.929 0.402 0.273 0.328 0.202 0.253 0.227

m 9 10 11 12 13 14 15 16ρ(m) −0.002 −0.025 0.027 −0.002 −0.002 0.040 0.018 −0.009σρ(m) 0.044 0.046 0.046 0.046 0.044 0.046 0.043 0.046Qm 10.583 11.739 13.095 13.103 13.110 16.017 16.694 16.834p-value 0.305 0.303 0.287 0.362 0.439 0.312 0.338 0.396

Usual tests, for strong white noise

m 1 2 3 4 5 6 7 8ρ(m) 0.016 −0.020 −0.045 0.015 −0.041 −0.023 −0.025 0.014σρ(m) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030QLBm 1.105 2.882 11.614 12.611 19.858 22.134 24.826 25.629

p-value 0.293 0.237 0.009 0.013 0.001 0.001 0.001 0.001

m 9 10 11 12 13 14 15 16ρ(m) 0.000 0.011 0.010 −0.014 0.020 0.024 0.037 0.001σρ(m) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030QLBm 25.629 26.109 26.579 27.397 29.059 31.497 37.271 37.279

p-value 0.002 0.004 0.005 0.007 0.006 0.005 0.001 0.002

Table C.4 Portmanteau tests on the S&P 500 returns from March 2, 1990 to December 29,2006.

Tests for GARCH noise based on Qm

m 1 2 3 4 5 6 7 8ρ(m) −0.002 −0.027 −0.034 0.007 −0.034 −0.023 −0.043 0.000σρ(m) 0.045 0.045 0.044 0.041 0.045 0.041 0.042 0.042Qm 0.011 1.440 3.665 3.773 6.039 7.243 11.242 11.242p-value 0.915 0.487 0.300 0.438 0.302 0.299 0.128 0.188

m 9 10 11 12 13 14 15 16ρ(m) 0.008 0.013 −0.013 0.049 0.034 0.001 −0.015 0.001σρ(m) 0.039 0.041 0.041 0.041 0.039 0.039 0.038 0.041Qm 11.392 11.793 12.212 17.734 20.630 20.633 21.255 21.258p-value 0.250 0.299 0.348 0.124 0.081 0.111 0.129 0.169

Usual tests for strong white noise

m 1 2 3 4 5 6 7 8ρ(m) −0.002 −0.027 −0.034 0.007 −0.034 −0.023 −0.043 0.000σρ(m) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030QLBm 0.025 3.183 7.964 8.166 13.176 15.398 23.280 23.280

p-value 0.874 0.204 0.047 0.086 0.022 0.017 0.002 0.003

m 9 10 11 12 13 14 15 16ρ(m) 0.008 0.013 −0.013 0.049 0.034 0.001 −0.015 0.001σρ(m) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030QLBm 23.534 24.294 25.069 35.140 40.052 40.058 41.067 41.073

p-value 0.005 0.007 0.009 0.000 0.000 0.000 0.000 0.001

Page 413: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 397

Table C.5 Studentized statistics for the corner method and selected ARMA orders, on the DAXreturns.

.p.|.q..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...

1 | -0.3 -0.3 -1.7 1.5 -0.8 -1.7 -0.7 1.3 -0.1 -1.1 1.2 0.1 -0.1 1.8 0.8

2 | 0.3 -0.2 0.9 0.2 1.1 0.7 1.0 0.6 0.8 0.5 0.6 0.1 -0.1 0.9

3 | -1.7 0.9 -0.6 -0.6 -0.9 0.1 -0.6 -0.1 0.2 0.1 0.4 0.6 0.7

4 | -1.5 0.2 0.6 -0.7 0.7 -0.5 0.4 0.4 0.1 -0.2 0.1 0.3

5 | -0.8 1.1 -0.9 0.7 -0.5 0.3 -0.4 -0.1 0.2 0.0 0.2

6 | 1.7 0.7 0.0 -0.5 -0.3 0.3 0.4 0.3 0.3 0.2

7 | -0.6 1.0 -0.6 0.3 -0.4 0.4 -0.3 0.4 0.1

8 | -1.1 0.5 0.2 0.4 0.2 0.3 -0.4 0.4

9 | -0.2 0.8 0.1 0.1 0.2 0.3 0.2

10 | 1.0 0.5 -0.1 -0.1 0.0 0.2

11 | 1.3 0.6 0.3 0.1 0.2

12 | 0.1 0.2 -0.5 0.2

13 | -0.2 0.1 0.8

14 | -2.1 1.1

15 | 0.7

ARMA(P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 0,14) ( 1, 1) (14, 0)

0.100000 1.64 ( 0,14) ( 1, 1) (14, 0)

0.050000 1.96 ( 0, 1) (14, 0)

0.020000 2.33 ( 0, 0)

0.010000 2.58 ( 0, 0)

0.005000 2.81 ( 0, 0)

0.002000 3.09 ( 0, 0)

0.001000 3.29 ( 0, 0)

0.000100 3.72 ( 0, 0)

0.000010 4.26 ( 0, 0)

Page 414: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

398 GARCH MODELS

Table C.6 Studentized statistics for the corner method and selected ARMA orders, on thesquared DAX returns.

.max(p,q).|.p..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...

1 | 4.4 6.0 7.0 4.4 5.1 4.7 4.5 4.7 4.2 4.1 4.0 3.9 4.2 4.5 4.8

2 | -6.6 1.8 -0.4 0.0 -0.9 1.0 -1.0 1.8 -1.9 0.6 0.0 0.9 -1.3 1.4

3 | 8.4 -0.2 0.4 -0.3 0.5 0.4 -0.9 0.8 1.4 0.3 -0.5 0.5 0.3

4 | -5.9 -0.2 0.1 0.6 -0.4 0.4 0.4 0.8 -1.0 0.8 -0.7 0.5

5 | 7.1 -1.1 0.6 -0.4 -0.4 0.3 0.6 0.9 0.7 -0.7 -0.3

6 | -4.2 1.0 -0.4 0.4 -0.4 0.5 -0.5 0.7 -0.7 0.6

7 | 2.1 -1.3 -1.0 0.3 0.7 -0.3 -0.6 0.5 0.3

8 | -3.9 2.8 -1.1 1.0 -1.0 0.7 -0.6 0.5

9 | 0.4 -1.4 2.0 -1.4 0.7 -0.8 0.0

10 | -1.1 0.3 -0.5 1.0 0.7 0.6

11 | 1.8 0.4 -0.1 -0.8 -0.3

12 | -1.5 0.9 -0.7 0.6

13 | 0.3 -0.9 0.2

14 | -0.9 0.6

15 | -0.1

GARCH(p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 4, 1) ( 4, 2) ( 4, 3) ( 4, 4) ( 1, 9) ( 0,12)

0.100000 1.64 ( 3, 1) ( 3, 2) ( 3, 3) ( 1, 9) ( 0,11)

0.050000 1.96 ( 2, 1) ( 2, 2) ( 0, 8)

0.020000 2.33 ( 2, 1) ( 2, 2) ( 0, 8)

0.010000 2.58 ( 2, 1) ( 2, 2) ( 0, 8)

0.005000 2.81 ( 2, 1) ( 2, 2) ( 0, 8)

0.002000 3.09 ( 1, 1) ( 0, 8)

0.001000 3.29 ( 1, 1) ( 0, 8)

0.000100 3.72 ( 1, 1) ( 0, 8)

0.000010 4.26 ( 1, 1) ( 0, 5)

Page 415: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 399

Table C.7 Studentized statistics for the corner method and selected ARMA orders, on the S&P500 returns.

.p.|.q..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...

1 | -0.1 -1.2 -1.5 0.3 -1.5 -1.1 -2.0 0.0 0.4 0.7 -0.7 2.6 1.9 0.1 -0.9

2 | 1.2 0.5 0.8 -1.0 0.8 -0.6 1.0 0.4 0.1 0.5 -0.4 1.5 0.7 0.8

3 | -1.5 0.8 -0.7 -0.8 -0.2 0.8 -0.7 0.5 -0.5 0.3 0.6 1.0 -0.1

4 | -0.3 -1.0 0.8 -0.5 0.6 -0.4 0.4 -0.5 -0.3 0.5 -0.2 0.8

5 | -1.6 0.8 -0.3 0.6 -0.5 -0.2 -0.2 0.5 -0.6 0.7 -0.7

6 | 1.1 -0.6 -0.8 -0.4 0.2 0.2 0.2 -0.5 -0.4 0.1

7 | -2.1 1.0 -0.6 0.4 -0.2 0.2 -0.4 0.4 -0.3

8 | 0.2 0.2 -0.4 -0.6 -0.5 -0.5 -0.4 -0.2

9 | 0.2 0.2 -0.3 -0.1 -0.6 -0.4 -0.3

10 | -0.5 0.3 -0.2 0.3 -0.7 0.1

11 | -0.8 -0.2 0.3 0.0 -0.7

12 | -2.4 1.4 -1.0 0.7

13 | 1.7 0.7 0.1

14 | -0.1 0.5

15 | -0.5

ARMA(P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 0,13) ( 1,12) ( 2, 2) (12, 1) (13, 0)

0.100000 1.64 ( 0,13) ( 1, 1) (13, 0)

0.050000 1.96 ( 0,12) ( 1, 1) (12, 0)

0.020000 2.33 ( 0,12) ( 1, 1) (12, 0)

0.010000 2.58 ( 0,12) ( 1, 0)

0.005000 2.81 ( 0, 0)

0.002000 3.09 ( 0, 0)

0.001000 3.29 ( 0, 0)

0.000100 3.72 ( 0, 0)

0.000010 4.26 ( 0, 0)

Page 416: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

400 GARCH MODELS

Table C.8 Studentized statistics for the corner method and selected ARMA orders, on thesquared S&P 500 returns.

.max(p,q).|.p..1....2....3....4....5....6....7....8....9...10...11...12...13...14...15...

1 | 7.5 6.4 2.6 5.2 6.4 3.0 3.7 3.8 3.2 3.0 3.8 3.7 2.2 4.3 3.1

2 | -4.5 0.1 0.3 -1.5 1.6 -1.1 0.6 0.9 -1.5 0.8 -0.1 0.5 -0.2 -0.1

3 | 1.8 0.3 0.4 1.1 0.5 0.6 0.8 0.9 1.0 0.6 -0.5 0.3 0.3

4 | -3.7 -1.5 -1.0 -0.4 0.2 0.5 0.3 -1.0 -0.8 0.8 -0.3 0.3

5 | 3.4 1.3 0.5 0.2 0.2 0.2 0.4 0.6 0.8 0.5 0.1

6 | -1.5 -1.1 -0.4 0.5 -0.2 -0.2 0.2 -0.3 -0.5 0.5

7 | 3.1 0.7 0.8 0.2 0.4 0.2 -0.1 0.3 0.5

8 | -2.9 0.8 -0.9 -0.9 -0.7 -0.4 -0.3 -0.5

9 | 1.4 -1.4 0.9 -0.6 0.7 -0.4 0.5

10 | -2.7 0.8 -0.7 0.7 -0.5 0.5

11 | 2.8 0.2 -0.3 -0.4 -0.2

12 | -2.4 0.6 -0.3 0.2

13 | 0.4 -0.2 0.2

14 | -0.4 0.1

15 | 0.6

GARCH(p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL

PROBA CRIT MODELS FOUND

0.200000 1.28 ( 2, 1) ( 2, 2) ( 1, 9) ( 0,12)

0.100000 1.64 ( 1, 1) ( 0,12)

0.050000 1.96 ( 1, 1) ( 0,12)

0.020000 2.33 ( 1, 1) ( 0,12)

0.010000 2.58 ( 1, 1) ( 0,11)

0.005000 2.81 ( 1, 1) ( 0, 8)

0.002000 3.09 ( 1, 1) ( 0, 7)

0.001000 3.29 ( 1, 1) ( 0, 5)

0.000100 3.72 ( 1, 1) ( 0, 2)

0.000010 4.26 ( 1, 1) ( 0, 2)

Page 417: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 401

Chapter 6

6.1 From the observations ε1, . . . , εn, we can compute ε2 = n−1∑nt=1 ε

2t and

ρε2(h) = γε2(h)

γε2 (0), γε2(h) = γε2 (−h) = 1

n

n∑t=1+h

(ε2t − ε2)(ε2

t−h − ε2),

for h = 0, . . . , q. We then putα1,1 = ρε2(1)

and then, for k = 2, . . . , q (when q > 1),

αk,k = ρε2(k)−∑k−1i=1 ρε2(k − i)αk−1,i

1−∑k−1i=1 ρε2(i)αk−1,i

,

αk,i = αk−1,i − αk,kαk−1,k−i , i = 1, . . . , k − 1.

With standard notation, the OLS estimators are then

αi = αi,q , ω =(

1−∑

αi

)ε2.

6.2 The assumption that X has full column rank implies that X′X is invertible. Denoting by 〈·, ·〉the scalar product associated with the Euclidean norm, we have⟨

Y −Xθn,X(θn − θ)⟩ = Y ′

{X −X

(X′X

)−1X′X

}(θn − θ) = 0

and

nQn(θ) = ‖Y −Xθ‖2

= ‖Y −Xθn‖2 + ‖X(θn − θ)‖2 + 2⟨Y −Xθn,X(θn − θ)

⟩≥ ‖Y −Xθn‖2 = nQn(θn),

with equality if and only if θ = θn, and we are done.

6.3 We can take n = 2, q = 1, ε0 = 0, ε1 = 1, ε2 = 0. The calculation yields (ω, α)′ = (1,−1).

6.4 Case 3 is not possible, otherwise we would have

ε2t < ε2

t − ω − α1ε2t−1 − α2ε

2t−2

for all t , and consequently ‖Y‖2 < ‖Y −Xθn‖2, which is not possible.Using the data, we obtain θ = (1,−1,−1/2), and thus θ c = θ . Therefore, the constrained

estimate must coincide with one of the following three constrained estimates: that constrainedby α2 = 0, that constrained by α1 = 0, or that constrained by α1 = α2 = 0. The estimate con-strained by α2 = 0 is θ = (7/12,−1/2, 0), and thus does not suit. The estimate constrainedby α1 = 0 yields the desired estimate θ c = (1/4, 0, 1/4).

6.5 First note that ε2t >ω0η

2t . Thus ε2

t = 0 if and only if ηt = 0. The nullity of the ith columnof X, for i > 1, implies that ηn−i+1 = · · · = η2 = η1 = 0. The probability of this event tendsto 0 as n→∞ because, since Eη2

t = 1, we have P(ηt = 0) < 1.

Page 418: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

402 GARCH MODELS

6.6 Introducing an initial value X0, the OLS estimator of φ0 is

φn =(

1

n

n∑t=1

X2t−1

)−11

n

n∑t=1

XtXt−1,

and this estimator satisfies

√n(φn − φ0) =

(1

n

n∑t=1

X2t−1

)−11√n

n∑t=1

εtXt−1.

Under the assumptions of the exercise, the ergodic theorem entails the almost sure conver-gence

1

n

n∑t=1

X2t−1 → EX2

t ,1

n

n∑t=1

XtXt−1 → EXtXt−1 = φ0EX2t ,

and thus the almost sure convergence of φn to φ0. For the consistency the assumption Eε2t <

∞ suffices.If Eε4

t <∞, the sequence (εtXt−1,Ft ) is a stationary and ergodic square integrablemartingale difference, with variance

Var(εtXt−1) = E(σ 2t X

2t−1).

We can see that this expectation exists by expanding the product

σ 2t X

2t−1 =

(ω0 +

q∑i=1

ε2t−i

)( ∞∑i=0

φi0εt−1−i

)2

.

The CLT of Corollary A.1 then implies that

1√n

n∑t=1

εtXt−1L→ N(0, E(σ 2

t X2t−1)),

and thus √n(φn − φ0)

L→ N {0, (EX2t )−2E(σ 2

t X2t−1)

}.

When σ 2t = ω0, the condition Eε2

t <∞ suffices for asymptotic normality.

6.7 By direct verification, A−1A = I .

6.8 .1. Let εt = εt /√ω0. Then (εt ) solves the model

εt =(

1+q∑i=1

α0i ε2t−i

)1/2

ηt .

The parameter ω0 vanishing in this equation, the moments of εt do not depend on it. Itfollows that Eε2m

t = E(√ω0εt )

2m = Kωm0 .

2. and 3. Write M = M(ωk0) to indicate that a matrix M is proportional to ωk0. Partition the

vector Zt−1 =(1, ε2

t−1, . . . , ε2t−q)′

into Zt−1 = (1,Wt−1)′ and, accordingly, the matrices

Page 419: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 403

A and B of Theorem 6.2. Using the previous question and the notation of Exercise 6.7,we obtain

A11 = A11(1), A12 = A′21 = A12(ω0), A22 = A22(ω20).

We then have

A−1 =[A−1

11 + A−111 A12FA21A

−111 (1) −A−1

11 A12F(ω−10 )

−FA21A−111 (ω

−10 ) F (ω−2

0 )

].

Similarly,B11 = B11(ω

20), B12 = B ′21 = B12(ω

30), B22 = B22(ω

40).

It follows that C = A−1BA−1 is of the form

C =[C11(ω

20) C12(ω0)

C21(ω0) C22(1)

].

6.9 .1. Let α = infy∈C ‖x − y‖. Let us show the existence of x∗. Let (xn) be a sequence ofelements of C such that, for all n> 0, ‖x − xn‖2 < α2 + 1/n. Using the median equality‖a + b‖2 + ‖a − b‖2 = 2‖a‖2 + 2‖b‖2, we have

‖xn − xm‖2 = ‖xn − x − (xm − x)‖2

= 2‖xn − x‖2 + 2‖xm − x‖2 − ‖xn − x + (xm − x)‖2

= 2‖xn − x‖2 + 2‖xm − x‖2 − 4‖x − (xm + xn)/2‖2

≤ 2(α2 + 1/n)+ 2(α2 + 1/m)− 4α2 = 2(1/n+ 1/m),

the last inequality being justified by the fact that (xm + xn)/2 ∈ C, the convexity of Cand the definition of α. It follows that (xn) is a Cauchy sequence and, E being a Hilbertspace and therefore a complete metric space, xn converges to some point x∗. Since C isclosed, x∗ ∈ C and ‖x − x∗‖ ≥ α. We have also ‖x − x∗‖ ≤ α, taking the limit on bothsides of the inequality which defines the sequence (xn). It follows that ‖x − x∗‖ = α,which shows the existence.

Assume that there exist two solutions of the minimization problem in C, x∗1 and x∗2 .Using the convexity of C, it is then easy to see that (x∗1 + x∗2 )/2 satisfies

‖x − (x∗1 + x∗2 )/2‖ = ‖x − x∗1‖ = ‖x − x∗2‖.This is possible only if x∗1 = x∗2 (once again using the median equality).

2. Let λ ∈ (0, 1) and y ∈ C. Since C is convex, (1− λ)x∗ + λy ∈ C. Thus

‖x − x∗‖2 ≤ ‖x − {(1− λ)x∗ + λy}‖2 = ‖x − x∗ + λ(x∗ − y)‖2

and, dividing by λ,λ‖x∗ − y‖2 − 2〈x∗ − x, x∗ − y〉 ≥ 0.

Taking the limit as λ tends to 0, we obtain inequality (6.17).Let z such that, for all y ∈ C, 〈z− x, z− y〉 ≤ 0. We have

‖x − z‖2 = 〈z− x, (z − y)+ (y − x)〉≤ 〈z− x, y − x〉 ≤ ‖x − z‖‖x − y‖,

the last inequality being simply the Cauchy–Schwarz inequality. It follows that ‖x − z‖ ≤‖x − y‖, ∀y ∈ C. This property characterizing x∗ in view of part 1, it follows that z = x∗.

Page 420: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

404 GARCH MODELS

6.10 .1. It suffices to show that when C = K , (6.17) is equivalent to (6.18). Since 0 ∈ K , takingy = 0 in (6.17) we obtain 〈x − x∗, x∗〉 ≤ 0. Since x∗ ∈ K and K is a cone, 2x∗ ∈ K . Fory = 2x∗ in (6.17) we obtain 〈x − x∗, x∗〉 ≥ 0, and it follows that 〈x − x∗, x∗〉 = 0. Thesecond equation of (6.18) then follows directly from (6.17). The converse, (6.18) �⇒(6.17), is trivial.

2. Since x∗ ∈ K , then z = λx∗ ∈ K for λ ≥ 0. By (6.18), we have{ 〈λx − z, z〉 = λ2〈x − x∗, x∗〉 = 0,〈λx − z, y〉 = λ〈x − x∗, y〉 ≤ 0, ∀y ∈ K.

It follows that (λx)∗ = z and (a) is shown. The properties (b) are obvious, expanding‖x∗ + (x − x∗)‖2 and using the first equation of (6.18).

6.11 The model is written as Y = X(1)θ (1) +X(2)θ (2) + U. Thus, since M2X(2) = 0, we have

M2Y = M2X(1)θ (1) +M2U. Note that this is a linear model, of parameter θ (1). Noting that

M ′2M2 = M2, since M2 is an orthogonal projection matrix, the form of the estimator follows.

6.12 Since Jn is symmetric, there exists a diagonal matrix Dn and an orthonormal matrix Pnsuch that Jn = PnDnP

′n. For n large enough, the eigenvalues of Jn are positive since J =

limn→∞ Jn is positive definite. Let λn be the smallest eigenvalue of Jn. Denoting by ‖ · ‖ theEuclidean norm, we have

X′nJnXn = X′nPnDnP′nXn ≥ λnX

′nPnP

′nXn = λn‖Xn‖2.

Since limn→∞X′nJnXn = 0 and limn→∞ λn > 0, it follows that limn→∞ ‖Xn‖ = 0, and thusthat Xn converges to the zero vector of Rk .

6.13 Applying the method of Section 6.3.2, we obtain X(1) = (1, 1)′ and thus, by Theorem 6.8,θ cn = (2, 0)′.

Chapter 7

7.1 .1. When j < 0, all the variables involved in the expectation, except εt−j , belong to the σ -fieldgnerated by {εt−j−1, εt−j−2, . . . }. We conclude by taking the expectation conditionally onthe previous σ -field and using the martingale increment property.

2. For j ≥ 0, we note that ε2t is a measurable function of η2

t , . . . , η2t−j+1 and of

ε2t−j , ε

2t−j−1, . . . Thus E

{g(ε2

t , ε2t−1, . . . )|εt−j , εt−j−1, . . . )

}is an even function of the

conditioning variables, denoted by h(ε2t−j , ε

2t−j−1, . . . ).

3. It follows that the expectation involved in the property can be written as

E{E(h(η2

t−jσ2t−j , ε

2t−j−1, . . . )ηt−j σt−j f (εt−j−1, εt−j−2, . . . )

| εt−j−1, εt−j−2, . . .)}

= E

{∫h(x2σ 2

t−j , ε2t−j−1, . . . )xσt−jf (εt−j−1, εt−j−2, . . . )dPη(x)

}= 0.

The latter equality follows from of the nullity of the integral, because the distribution ofηt is symmetric.

Page 421: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 405

7.2 By the Borel–Cantelli lemma, it suffices to show that for all real δ > 0, the series of generalterms P(ρt ε2

t > δ) converges. That is to say,

∞∑t=0

P(ρt ε2t > δ) ≤

∞∑t=0

E(ρtε2t )s

δs= E(ε2s

t )

(1− ρ)δs<∞,

using Markov’s inequality, strict stationarity and the existence of a moment of order s > 0for ε2

t .

7.3 For all κ > 0, the process (Xκt ) is ergodic and admits an expectation. This expectation is finite

since Xκt ≤ κ and (Xκ

t )− = X−t . We thus have, by the standard ergodic theorem,

1

n

n∑t=1

Xt ≥ 1

n

n∑t=1

Xκt → E(Xκ

1 ), a.s. as n→∞.

When κ →∞, the variable Xκ1 increases to X1. Thus by Beppo Levi’s theorem E(Xκ

1 )

converges to E(X1) = +∞. It follows that n−1∑nt=1 Xt tends almost surely to infinity.

7.4 .1. The assumptions made on f and � guarantee that Yt = {infθ∈� Xt(θ)} is a measurablefunction of ηt , ηt−1, . . . . By Theorem A.1, it follows that (Yt ) is stationary and ergodic.

2. If we remove condition (7.94), the property may not be satisfied. For example, let � ={θ1, θ2} and assume that the sequence (Xt (θ1), Xt (θ2)) is iid, with zero mean, each compo-nent being of variance 1 and the covariance between the two components being differentwhen t is even and when t is odd. Each of the two processes (Xt (θ1)) and (Xt(θ2)) is sta-tionary and ergodic (as iid processes). However, Yt = infθ (Xt (θ)) = min(Xt (θ1), Xt (θ2))

is not stationary in general because its distribution depends on the parity of t .

7.5 .1. In view of (7.30) and of the second part of assumption A1, we have

supθ∈�

∣∣Qn(θ)− Qn(θ)∣∣

= supθ∈�

n−1

∣∣∣∣∣n∑t=1

{(2σ 2

t + σ 2t − σ 2

t )(σ2t − σ 2

t )− 2ε2t (σ

2t − σ 2

t )}(θ)

∣∣∣∣∣≤ sup

θ∈�Kn−1

n∑t=1

{(2σ 2

t +Kρt)ρt + 2ε2t ρ

t}→ 0 (C.7)

almost surely. Indeed, on a set of probability 1, we have for all ι > 0,

lim supn→∞

supθ∈�

Kn−1n∑t=1

{(2σ 2

t +Kρt)ρt + 2ε2t ρ

t}

(C.8)

≤ ι lim supn→∞

n−1n∑t=1

{supθ∈�

σ 2t + ε2

t

}

= ι

{Eθ0 sup

θ∈�σ 2t + Eθ0σ

2t (θ0)

}.

Note that Eε2t <∞ and (7.29) entail that Eθ0 supθ∈� σ 2

t (θ) <∞. The limit superior (C.8)being less than any positive number, it is null.

2. Note that νt := ε2t − σ 2

t (θ0) = ε2t − Eθ0

(ε2t |εt−1, . . .

)is the strong innovation of ε2

t . Wethus have orthogonality between νt and any integrable variable which is measurable with

Page 422: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

406 GARCH MODELS

respect to the σ -field generated by {εu, u < t}. It follows that the asymptotic criterion isminimized at θ0:

limn→∞Qn(θ) = Eθ0

{ε2t − σ 2

t (θ0)+ σ 2t (θ0)− σ 2

t (θ)}2

= limn→∞Qn(θ0)+ Eθ0

{σ 2t (θ0)− σ 2

t (θ)}2 + 2Eθ0

[νt{σ 2t (θ0)− σ 2

t (θ)}]

= limn→∞Qn(θ0)+ Eθ0

{σ 2t (θ0)− σ 2

t (θ)}2 ≥ lim

n→∞Qn(θ0),

with equality if and only if σ 2t (θ) = σ 2

t (θ0) Pθ0 -almost surely, that is, θ = θ0 (by assump-tions A3 and A4; see the proof of Theorem 7.1).

3. We conclude that θn is strongly consistent, as in (d) in the proof of Theorem 7.1, usinga compactness argument and applying the ergodic theorem to show that, at any point θ1,there exists a neighborhood V (θ1) of θ1 such that

if θ1 ∈ �, θ1 = θ0, lim infn→∞ inf

θ∈V (θ1)Q(θ)> lim

n→∞Q(θ0) a.s.

4. Since all we have done remains valid when � is replaced by any smaller compact setcontaining θ0, for instance �c, the estimator θ cn is strongly consistent.

7.6 We know that θn minimizes, over �,

ln(θ) = n−1n∑t=1

ε2t

σ 2t

+ log σ 2t .

For all c > 0, there exists θ ∗n such that σ 2t (θ

∗n ) = cσ 2

t (θn) for all t ≥ 0. Note that θ∗n =θn if and only if c = 1. For instance, for a GARCH(1, 1) model, if θn = (ω, α1, β1) wehave θ∗n = (cω, cα1, β1). Let f (c) = ln(θ∗n ). The minimum of f is obtained at the uniquepoint

c = n−1n∑t=1

ε2t

σ 2t (θn)

.

For this value c, we have θ∗n = θn. It follows that c = 1 with probability 1, which proves theresult.

7.7 The expression for I1 is a trivial consequence of (7.74) and Cov(1− η2t , ηt ) = 0. Similarly,

the form of I2 directly follows from (7.38). Now consider the nondiagonal blocks. Using(7.38) and (7.74), we obtain

Eϕ0

{∂ t

∂θi

∂ t

∂ϑj(ϕ0)

}= E(1− η2

t )2Eϕ0

{1

σ 4t

∂σ 2t

∂θi

∂σ 2t

∂ϑj(ϕ0)

}.

In view of (7.41), (7.42), (7.79) and (7.24), we have

Eϕ0

{1

σ 4t

∂σ 2t

∂ω

∂σ 2t

∂ϑj(ϕ0)

}

=∞∑

k1,k2=0

Bk10 (1, 1)Bk2

0 (1, 1)q∑i=1

2α0iEϕ0

{σ−4t εt−k2−i

∂εt−k2−i∂ϑj

(ϕ0)

}= 0,

Page 423: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 407

Eϕ0

{1

σ 4t

∂σ 2t

∂αi0

∂σ 2t

∂ϑj(ϕ0)

}

=∞∑

k1,k2=0

Bk10 (1, 1)Bk2

0 (1, 1)q∑i=1

2α0iEϕ0

{σ−4t ε2

t−k1−i0εt−k2−i∂εt−k2−i∂ϑj

(ϕ0)

}= 0

and

Eϕ0

{1

σ 4t

∂σ 2t

∂βj0

∂σ 2t

∂ϑj(ϕ0)

}

=∞∑

k1,k2=0

{k1∑ =1

B −10 B

(j0)0 B

k1− 0

}(1, 1)Bk2

0 (1, 1)q∑i=1

2α0i

×Eϕ0

{σ−4t

(ω0 +

q∑i′=1

α0i′ε2t−k1−i′

)εt−k2−i

∂εt−k2−i∂ϑj

(ϕ0)

}= 0.

It follows that

∀i, j, Eϕ0

{1

σ 4t

∂σ 2t

∂θi

∂σ 2t

∂ϑj(ϕ0)

}= 0 (C.9)

and I is block-diagonal. It is easy to see that J has the form given in the theorem. Theexpressions for J1 and J2 follow directly from (7.39) and (7.75). The block-diagonal formfollows from (7.76) and (C.9).

7.8 .1. We have εt = Xt − aXt−1, σ2t = 1+ αε2

t−1. The parameters to be estimated are a and α,ω being known. We have

∂εt

∂a= −Xt−1,

∂σ 2t

∂a= −2αεt−1Xt−2,

∂σ 2t

∂α= ε2

t−1,

∂2σ 2t

∂a2= 2αX2

t−2,∂2σ 2

t

∂a∂α= −2αεt−1Xt−2,

∂2σ 2t

∂α2= 0.

It follows that

∂ t

∂a(ϕ0) = −(1− η2

t )2α0εt−1Xt−2

σ 2t

− ηt2Xt−1

σt,

∂ t

∂α(ϕ0) = (1− η2

t )ε2t−1

σ 2t

,

∂2 t

∂α2(ϕ0) =

ε4t−1

σ 4t

,

∂2 t

∂a2(ϕ0) = −ηt 8α0εt−1Xt−1Xt−2

σ 3t

+ (2η2t − 1)

(2α0εt−1Xt−2)2

σ 4t

+ (1− η2t )

2α0X2t−2

σ 2t

+ 2X2t−1

σ 4t

,

∂2 t

∂a∂α(ϕ0) = η2

t

2α0ε3t−1Xt−2

σ 4t

− (1− η2t )

2α0ε3t−1Xt−2

σ 4t

+ ηt2ε2

t−1Xt−1

σ 3t

.

Page 424: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

408 GARCH MODELS

Letting I = (Iij ), J = (Jij ) and µ3 = Eη3t , we then obtain

I11 = Eϕ0

(∂ t

∂a(ϕ0)

)2

= 4α20(κη − 1)Eϕ0

(ε2t−1X

2t−2

σ 4t

)+ 4Eϕ0

(X2t−1

σ 2t

)

− 4α0µ3Eϕ0

(εt−1Xt−1Xt−2

σ 3t

),

I12 = I12 = Eϕ0

(∂ t

∂a

∂ t

∂α(ϕ0)

)

= 2α0(κη − 1)Eϕ0

(ε3t−1Xt−2

σ 4t

)− 2µ3Eϕ0

(ε2t−1Xt−1

σ 3t

),

I22 = Eϕ0

(∂ t

∂α(ϕ0)

)2

= (κη − 1)Eθ0

(ε4t−1

σ 4t

),

J11 = Eϕ0

(∂2 t

∂a2

)= 4α2

0Eϕ0

(ε2t−1X

2t−2

σ 4t

)+ 2Eϕ0

(X2t−1

σ 2t

),

J12 = J21 = Eϕ0

(∂2 t

∂a∂α

)= 2α0Eϕ0

(ε3t−1Xt−2

σ 4t

,

)

J22 = Eϕ0

(∂2 t

∂α2

)= Eθ0

(ε4t−1

σ 4t

).

2. In the case where the distribution of ηt is symmetric we have µ3 = 0 and, using (7.24),I12 = J12 = 0. It follows that

=( I11/J2

11 00 I22/J2

22

).

The asymptotic variance of the ARCH parameter estimator is thus equal to(κη − 1){Eθ0

(ε4t−1/σ

4t

)}−1: it does not depend on a0 and is the same as that of the QMLEof a pure ARCH(1) (using computations similar to those used to obtain (7.1.2)).

3. When α0 = 0, we have σ 2t = 1, and thus EX2

t = 1/(1− a20). It follows that

I =(

4/(1− a20) −2µ2

3−2µ2

3 (κη − 1)κη

), J =

(2/(1− a2

0) 00 κη

),

=(

1− a20 −µ2

3(1− a20)/κη

−µ23(1− a2

0)/κη (κη − 1)/κη

).

We note that the estimation of too complicated a model (since the true process is AR(1)without ARCH effect) does not entail any asymptotic loss of accuracy for the estimationof the parameter a0: the asymptotic variance of the estimator is the same, 1− a2

0 , as ifthe AR(1) model were directly estimated. This calculation also allows us to verify the‘α0 = 0’ column in Table 7.3: for the N(0, 1) law we have µ3 = 0 and κη = 3; for thenormalized χ2(1) distribution we find µ3 = 4/

√2 and κη = 15.

Page 425: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 409

7.9 Let ε > 0 and V (θ0) be such that (7.95) is satisfied. Since θn → θ0 a.s., for n large enoughθn ∈ V (θ0) a.s. We thus have almost surely∥∥∥∥∥ 1

n

n∑t=1

Jt (θn)− J

∥∥∥∥∥ ≤∥∥∥∥∥ 1

n

n∑t=1

Jt (θ0)− J

∥∥∥∥∥+ 1

n

n∑t=1

∥∥Jt(θn)− Jt (θ0)∥∥

≤∥∥∥∥∥ 1

n

n∑t=1

Jt (θ0)− J

∥∥∥∥∥+ 1

n

n∑t=1

supθ∈V (θ0)

‖Jt (θ)− Jt (θ0)‖.

It follows that

limn→∞

∥∥∥∥∥ 1

n

n∑t=1

Jt (θn)− J

∥∥∥∥∥ ≤ ε

and, since ε can be chosen arbitrarily small, we have the desired result.In order to give an example where (7.95) is not satisfied, let us consider the autoregressive

model Xt = θ0Xt−1 + ηt where θ0 = 1 and (ηt ) is an iid sequence with mean 0 and variance1. Let Jt (θ) = Xt − θXt−1. Then Jt (θ0) = ηt and the first convergence of the exercise holdstrue, with J = 0. Moreover, for all neighborhoods of θ0,

1

n

n∑t=1

supθ∈V (θ0)

|Jt(θ)− Jt (θ0)| =(

1

n

n∑t=1

|Xt−1|)

supθ∈V (θ0)

|θ − θ0| → +∞, a.s.,

because the sum in brackets converges to +∞, Xt being a random walk and the supremumbeing strictly positive. Thus (7.95) is not satisfied. Nevertheless, we have

1

n

n∑t=1

Jt (θn) = 1

n

n∑t=1

(Xt − θnXt−1)

= 1

n

n∑t=1

ηt +√n(θn − 1)

1

n3/2

n∑t=1

Xt−1

→ J = 0, in probability.

Indeed, n−3/2∑nt=1 Xt−1 converges in law to a nondegenerate random variable (see, for

instance, Hamilton, 1994, p. 406) whereas√n(θn − 1)→ 0 in probability since n(θn − 1)

has a nondegenerate limit distribution.

7.10 It suffices to show that J−1 − θ0θ′0 is positive semi-definite. Note that θ ′0(∂σ

2t (θ0)/∂θ) =

σ 2t (θ0). It follows that

θ ′0J = E(Zt ), where Zt = 1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ.

Therefore J − J θ0θ′0J = Var(Zt ) is positive semi-definite. Thus

y′J (J−1 − θ0θ′0)Jy = y ′(J − J θ0θ

′0J )y ≥ 0, ∀y ∈ Rq+1.

Setting x = Jy, we then have

x ′(J−1 − θ0θ′0)x ≥ 0, ∀x ∈ Rq+1,

which proves the result.

Page 426: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

410 GARCH MODELS

7.11 .1. In the ARCH case, we have θ ′0(∂σ2t (θ0)/∂θ) = σ 2

t (θ0). It follows that

E

{1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ ′

(1− θ ′0

σ 2t (θ0)

∂σ 2t (θ0)

∂θ

)}= 0,

or equivalently �′ = θ ′0J , that is, �′J−1 = θ ′0. We also have θ ′0J θ0 = 1, and thus

1 = θ ′0J θ0 = �′J−1JJ−1� = �′J−1�.

2. Introducing the polynomial Bθ (z) = 1−∑p

j=1 βj zj , the derivatives of σ 2

t (θ) satisfy

Bθ (L)∂σ 2

t

∂ω(θ) = 1,

Bθ (L)∂σ 2

t

∂αi(θ) = ε2

t−i , i = 1, . . . , q

Bθ (L)∂σ 2

t

∂βj(θ) = σ 2

t−j , j = 1, . . . , p.

It follows that

Bθ (L)∂σ2t (θ)

∂θ ′θ = ω +

q∑i=1

αiε2t−i = Bθ (L)σ 2

t (θ).

In view of assumption A2 and Corollary 2.2, the roots of Bθ (L) are outside the unit disk,and the relation follows.

3. It suffices to replace θ0 by θ0 in 1.

7.12 Only three cases have to be considered, the other ones being obtained by symmetry. Ift1 < min{t2, t3, t4}, the result is obtained from (7.24) with g = 1 and t − j = t1. If t2 = t3 <

t1 < t4, the result is obtained from (7.24) with g(ε2t , . . .) = ε2

t , t = t2 and t − j = t1. If t2 =t3 = t4 < t1, the result is obtained from (7.24) with g(ε2

t , . . .) = ε2t , j = 0, t2 = t3 = t4 = t

and f (εt−j−1, . . .) = εt1 .

7.13 .1. It suffices to apply (7.38), and then to apply Corollary 2.1 on page 26.

2. The result follows from the Lindeberg central limit theorem of Theorem A.3 on page 345.

3. Using (7.39) and the convergence of ε2t−1 to +∞,

1

n

n∑t=1

∂2

∂α2 t (α0) = 1

n

n∑t=1

(2η2t − 1)

(ε2t−1

1+ α0ε2t−1

)2

→ 1

α20

a.s.

4. In view of (7.50) and the fact that ∂2σ 2t (α)/∂α

2 = ∂3σ 2t (α)/∂α

3 = 0, we have

∣∣∣∣ ∂3

∂α3 t (α)

∣∣∣∣ =∣∣∣∣∣∣{

2− 6(1+ α0ε

2t−1)η

2t

1+ αε2t−1

}(ε2t−1

1+ αε2t−1

)3∣∣∣∣∣∣

≤{

2+ 6(

1+ α0

α

)η2t

} 1

α3.

Page 427: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 411

5. The derivative of the criterion is equal to zero at αcn. A Taylor expansion of this derivativearound α0 then yields

0 = 1√n

n∑t=1

∂α t (α0)+ 1

n

n∑t=1

∂2

∂α2 t (α0)

√n(αcn − α0)

+ 1

n

n∑t=1

∂3

∂α3 t (α

∗)√n(αcn − α0)

2

2,

where α∗ is between αcn and α0. The result easily follows from the previous questions.

6. When ω0 = 1, we have

∂α t (α0) =

(1− ε2

t

1+ α0ε2t−1

)ε2t−1

1+ α0ε2t−1

= (1− η2t

) ε2t−1

1+ α0ε2t−1

+ dt ,

with

dt =ε2t−1(1− ω0)

(1+ α0ε2t−1)

2η2t .

Since dt → 0 a.s. as t →∞, the convergence in law of part 2 always holds true. More-over,

∂2

∂α2 t (α0) =

(2

ε2t

1+ α0ε2t−1

− 1

)(ε2t−1

1+ α0ε2t−1

)2

= (2η2t − 1)

(ε2t−1

1+ α0ε2t−1

)2

+ d∗t ,

with

d∗t = 2(ω0 − 1)η2

t

(1+ α0ε2t−1)

(ε2t−1

1+ α0ε2t−1

)2

= o(1) a.s.,

which implies that the result obtained in part 3 does not change. The same is true for part4 because ∣∣∣∣ ∂3

∂α3 t (α)

∣∣∣∣ =∣∣∣∣∣∣{

2− 6(ω0 + α0ε

2t−1)η

2t

1+ αε2t−1

}(ε2t−1

1+ αε2t−1

)3∣∣∣∣∣∣

≤{

2+ 6(ω0 + α0

α

)η2t

} 1

α3.

Finally, it is easy to see that the asymptotic behavior of αcn(ω0) is the same as that ofαcn(ω), regardless of the value that is fixed for ω.

7. In practice ω0 is not known and must be estimated. However, it is certainly impossible toestimate the whole parameter (ω0, α0) without the strict stationarity assumption. Moreover,under (7.14), the ARCH(1) model generates explosive trajectories which do not look liketypical trajectories of financial returns.

Page 428: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

412 GARCH MODELS

7.14 .1. Consider a constant α ∈ (0, α0). We begin by showing that αcn(1)>α for n large enough.Note that

αcn(1) = arg minα∈[0,∞)

Qn(α), Qn(α) = 1

n

n∑t=1

{ t (α)− t (α0)} .

We have

Qn(α) = 1

n

n∑t=1

η2t

{σ 2t (α0)

σ 2t (α)

− 1

}+ log

σ 2t (α)

σ 2t (α0)

= 1

n

n∑t=1

η2t

(α0 − α)ε2t−1

1+ αε2t−1

+ log1+ αε2

t−1

1+ α0ε2t−1

.

In view of the inequality x ≥ 1+ log x for all x > 0, it follows that

infα<α

Qn(α) ≥ log1

n

n∑t=1

η2t + log

(α0 − α)ε2t−1

1+ α0ε2t−1

+ 1.

For all M> 0, there exists an integer tM such that ε2t >M for all t > tM . This entails that

lim infn→∞ inf

α<αQn(α) ≥ log

(α0 − α)M

1+ α0M+ 1.

Since M is arbitrarily large,

lim infn→∞ inf

α<αQn(α) ≥ log

α0 − α

α0+ 1> 0 (C.10)

provided that α < (1− e−1)α0. If α is chosen so that the constraint is satisfied, the inequal-ities

lim supn→∞

Qn(αn) ≤ lim supn→∞

Qn(α0) = 0

and (C.10) show that

limn→∞ αn ≥ α a.s. (C.11)

We will define a criterion On asymptotically equivalent to the criterion Qn. Since ε2t−1 →

∞ a.s. as t →∞, we have for α = 0,

limn→∞Qn(α) = lim

n→∞On(α),

where

On(α) = 1

n

n∑t=1

η2t

(α0 − α)

α+ log

α

α0.

On the other hand, we have

limn→∞On(α) = α0

α− 1+ log

α

α0> 0

Page 429: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 413

when α0/α = 1. We will now show that Qn(α)−On(α) converges to zero uniformly inα ∈ (α,∞). We have

Qn(α)−On(α) = 1

n

n∑t=1

η2t

α − α0

α(1+ αε2t−1)

+ log(1+ αε2

t−1)α0

(1+ α0ε2t−1)α

.

Thus for all M> 0 and any ε > 0, almost surely

|Qn(α)−On(α)| ≤ (1+ ε)|α − α0|α2M

+ |α − α0|αα0M

,

provided n is large enough. In addition to the previous constraints, assume that α < 1.We have |α − α0|/α2M ≤ α0/α

2M for any α < α ≤ α0, and

|α − α0|α2M

≤ α

α2M≤ 1

αM≤ 1

α2M

for any α ≥ α0. We then have

limn→∞ sup

α>α

|Qn(α)−On(α)| ≤ (1+ ε)1+ α0

α2M+ 1+ α0

αα0M.

Since M can be chosen arbitrarily large and ε arbitrarily small, we have almost surely

limn→∞ sup

α >α

|Qn(α)−On(α)| = 0. (C.12)

For the last step of the proof, let α−0 and α+0 be two constants such that α−0 < α0 < α+0 . Itcan always be assumed that α < α−0 . With the notation σ 2

η = n−1∑nt=1 η

2t , the solution of

α∗n = arg minαOn(α)

is α∗n = α0σ2η . This solution belongs to the interval (α−0 , α

+0 ) when n is large enough. In

this caseα∗∗n = arg min

α ∈(α−0 ,α+0 )On(α)

is one of the two extremities of the interval (α−0 , α+0 ), and thus

limn→∞On(α

∗∗n ) = min

{limn→∞On(α

−0 ), lim

n→∞On(α+0 )}> 0.

This result, (C.12), the fact that minα Qn(α) ≤ Qn(α0) = 0 and (C.11) show that

limn→∞ arg min

α≥0Qn(α) ∈ (α−0 , α+0 ).

Since (α−0 , α+0 ) is an arbitrarily small interval that contains α0 and αn = arg minα Qn(α),

the conclusion follows.

2. It can be seen that the constant 1 does not play any particular role and can be replacedby any other positive number ω. However, we cannot conclude that αn → α0 a.s. becauseαn = αcn(ωn), but ωn is not a constant. In contrast, it can be shown that under the strictstationarity condition α0 < exp{−E(log η2

t )} the constrained estimator αcn(ω) does notconverge to α0 when ω = ω0.

Page 430: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

414 GARCH MODELS

Chapter 8

8.1 Let the Lagrange multiplier λ ∈ Rp. We have to maximize the Lagrangian

L(x, λ) = (x − x0)′J (x − x0)+ λ′Kx.

Since at the optimum

0 = ∂L(x, λ)∂x

= 2Jx − 2Jx0 +K ′λ,

the solution is such that x = x0 − 12J

−1K ′λ. Since 0 = Kx = Kx0 − 12KJ

−1K ′λ, we obtain

λ = ( 12KJ

−1K ′)−1Kx0, and then the solution is

x = x0 − J−1K ′ (KJ−1K ′)−1Kx0.

8.2 Let K be the p × n matrix such that K(1, i1) = · · · = K(p, ip) = 1 and whose the otherelements are 0. Using Exercise 8.1, the solution has the form

x ={In − J−1K ′ (KJ−1K ′)−1

K}x0. (C.13)

Instead of the Lagrange multiplier method, a direct substitution method can also be used.The constraints xi1 = · · · = xip = 0 can be written as

x = Hx∗

where H is n× (n− p), of full column rank, and x∗ is (n− p)× 1 (the vector of the nonzero

components of x). For instance: (i) if n = 3, x2 = x3 = 0 then x∗ = x1 and H = 1

00

;(ii) if n = 3, x3 = 0 then x∗ = (x1, x2)

′ and H = 1 0

0 10 0

.

If we denote by Col(H) the space generated by the columns of H , we thus have to find

minx∈Col(H)

‖x − x0‖J

where ‖.‖J is the norm ‖z‖J =√z′J z.

This norm defines the scalar product 〈z, y〉J = z′Jy. The solution is thus the orthogonal(with respect to this scalar product) projection of x0 on Col(H). The matrix of such aprojection is

P = H(H ′JH)−1H ′J.

Indeed, we have P 2 = P , PHz = Hz, thus Col(H) is P -invariant, and 〈Hy, (I − P)z〉J =y ′H ′J (I − P)z = y ′H ′J z− y ′H ′JH(H ′JH)−1H ′J z = 0, thus z− Pz is orthogonal toCol(H).

It follows that the solution is

x = Px0 = H(H ′JH)−1H ′Jx0. (C.14)

Page 431: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 415

This last expression seems preferable to (C.13) because it only requires the inversion ofthe matrix H ′JH of size n− p, whereas in (C.14) the inverse of J , which is of size n, isrequired.

8.3 In case (a), we have

K = (0, 0, 1) and H = 1 0

0 10 0

,

and then

(H ′JH)−1 =(

2 −1−1 2

)−1

=(

2/3 1/31/3 2/3

),

H(H ′JH)−1H ′ = 2/3 1/3 0

1/3 2/3 00 0 0

and, using (C.14),

P = H(H ′JH)−1H ′J = 1 0 1/3

0 1 2/30 0 0

which gives a constrained minimum at

x = x01 + x03/3

x02 + x03/30

.

In case (b), we have

K =(

0 1 00 0 1

)and H =

100

, (C.15)

and, using (C.14), a calculation, which is simpler than the previous one (we do not have toinvert any matrix since H ′JH is scalar), shows that the constrained minimum is at

x = x01 − x02/2

00

.

The same results can be obtained with formula (C.13), but the computations are longer,in particular because we have to compute

J−1 = 3/4 1/2 −1/4

1/2 1 −1/2−1/4 −1/2 3/4

. (C.16)

Page 432: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

416 GARCH MODELS

8.4 Matrix J−1 is given by (C.16). With the matrix K1 = K defined by (C.15), and denoting byK2 and K3 the first and second rows of K , we then obtain

I3 − J−1K ′ (KJ−1K ′)−1K =

1 −1/2 00 0 00 0 0

,

I3 − J−1K ′2

(K2J

−1K ′2

)−1K2 =

1 −1/2 00 0 00 1/2 1

,

I3 − J−1K ′3

(K3J

−1K ′3

)−1K3 =

1 0 1/30 1 2/30 0 0

.

It follows that the solution will be found among (a) λ = Z,

(b) λ = Z1 − Z2/2

00

, (c) λ = Z1 − Z2/2

0Z3 + Z2/2

, (d) λ = Z1 + Z3/3

Z2 + 2Z3/30

.

The value of Q(λ) is 0 in case (a), 3Z22/2+ 2Z2Z3 + 2Z2

3 in case (b), Z22 in case (c) and

Z23/3 in case (d).

To find the solution of the constrained minimization problem, it thus suffices to take thevalue λ which minimizes Q(λ) among the subset of the four vectors defined in (a)–(d) whichsatisfy the positivity constraints of the two last components.

We thus find the minimum at λ! = Z = (−2, 1, 2)′ in case (i), at

λ! = −3/2

03/2

in case (ii) where Z = −2−12

,

λ! = −5/2

00

in case (iii) where Z = −2

1−2

and

λ! = −3/2

00

in case (iv) where Z = −2−1−2

.

8.5 Recall that for a variable Z ∼ N(0, 1), we have EZ+ = −EZ− = (2π)−1/2 and Var(Z+) =Var(Z−) = 1

2 (1− 1/π). We have

Z = Z1

Z2

Z3

∼ N0, = (κη − 1)J−1 =

(κη + 1)ω20 −ω0 −ω0

−ω0 1 0−ω0 0 1

.

It follows that

Eλ! =

−ω√

1√2π1√2π

.

Page 433: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 417

The coefficient of the regression of Z1 on Z2 is −ω0. The components of the vector (Z1 +ω0Z2, Z2) are thus uncorrelated and, this vector being Gaussian, they are independent. Inparticular E(Z1 + ω0Z2)Z

−2 = 0, which gives Cov(Z1, Z

−2 ) = EZ1Z

−2 = −ω0E(Z2Z

−2 ) =

−ω0E(Z−2 )

2 = −ω0/2. We thus have

Var(Z1 + ω0Z−2 + ω0Z

−3 ) = (κη + 1)ω2

0 + ω20

(1− 1

π

)− 2ω2

0 =(κη − 1

π

)ω2

0.

Finally,

Var(λ!) = 1

2

(1− 1

π

) 2 κηπ−1π−1 ω2

0 −ω0 −ω0

−ω0 1 0−ω0 0 1

.

It can be seen that

Var(Z)− Var(λ!) = 1

2

(1+ 1

π

) 2ω20 −ω0 −ω0

−ω0 1 0−ω0 0 1

is a positive semi-definite matrix.

8.6 At the point θ0 = (ω0, 0, . . . , 0), we have

∂σ 2t (θ0)

∂θ= (1, ω0η

2t−1, . . . , ω0η

2t−q)′

and the information matrix (written for simplicity in the ARCH(3) case) is equal to

J (θ0) := Eθ0

(1

σ 4t (θ0)

∂σ 2t (θ0)

∂θ

∂σ 2t (θ0)

∂θ ′

)

= 1

ω20

1 ω0 ω0 ω0

ω0 ω20κη ω2

0 ω20

ω0 ω20 ω2

0κη ω20

ω0 ω20 ω2

0 ω20κη

.

This matrix is invertible (which is not the case for a general GARCH(p, q)). We finallyobtain

(θ0) = (κη − 1)J (θ0)−1 =

(κη + q − 1)ω2 −ω · · · −ω

−ω...

−ωIq

.

8.7 We have σ 2t = ω + αε2

t−1, ∂σ 2t /∂θ

′ = (1, ε2t−1) and

J := Eθ0

1

σ 4t

∂σ 2t

∂θ

∂σ 2t

∂θ ′(θ0) = Eθ0

1

σ 4t (θ0)

(1 ε2

t−1ε2t−1 ε4

t−1

)= 1

ω20

(1 ω0

ω0 ω20κη

),

and thus

J−1 = 1

κη − 1

(ω2

0κη −ω0

−ω0 1

).

Page 434: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

418 GARCH MODELS

In view of Theorem 8.1 and (8.15), the asymptotic distribution of√n(θ − θ0) is that of the

vector λ! defined by

λ! =(

Z1

Z2

)− Z−2

( −ω0

1

)=(

Z1 + ω0Z−2

Z+2

).

We have EZ+2 = −EZ−2 = (2π)−1/2, thus

Eλ! = 1√2π

( −ω0

1

).

Since the components of the Gaussian vector (Z1 + ω0Z2, Z2) are uncorrelated, they areindependent, and it follows that

Cov(Z1, Z−2 ) = −ω0Cov(Z2, Z

−2 ) = −ω0/2.

We then obtain

Varλ! =(

ω20

{κη − 1

2

(1+ 1

π

)} −ω012

(1− 1

π

)−ω0

12

(1− 1

π

) 12

(1− 1

π

) ).

Let f (z1, z2) be the density of Z, that is, the density of a centered normal with variance(κη − 1)J−1. It is easy to show that the distribution of Z1 + ω0Z

−2 admits the density h(x) =∫∞

0 f (x, z2)dz2 +∫ 0−∞ f (x − ω0z2, z2)dz2 and to check that this density is asymmetric.

A simple calculation yields E(Z−2)3 = −2/

√2π . From E(Z1 + ω0Z2)

(Z−2)2 = 0,

we then obtain EZ1(Z−2)2 = 2ω0/

√2π . And from E(Z1 + ω0Z2)

2(Z−2) = E(Z1 +

ω0Z2)2E(Z−2)

we obtain EZ21Z

−2 = −ω2

0(κη + 1)/√

2π . Finally, we obtain

E(Z1 + ω0Z−2 )

3 = 3ω0EZ21Z

−2 + 3ω2

0EZ1(Z−2)2 + ω3

0E(Z−2)3

= ω30√

(1− 3κη

).

8.8 The statistic of the C test is N(0, 1) distributed under H0. The p-value of C is thus 1−�(n−1/2∑n

i=1 Xi

). Under the alternative, we have almost surely n−1/2∑n

i=1 Xi ∼ √nθ asn→+∞. It can be shown that log {1−�(x)} ∼ −x2/2 in the neighborhood of +∞. InBahadur’s sense, the asymptotic slope of the C test is thus

c(θ) = limn→∞

−2

n× −

(√nθ)2

2= θ2, θ > 0.

The p-value of C∗ is 2(1−�(∣∣n−1/2∑n

i=1 Xi

∣∣). Since log 2 {1−�(x)} ∼ −x2/2 in theneighborhood of +∞, the asymptotic slope of C∗ is also c∗(θ) = θ2 for θ > 0. The C andC∗ tests having the same asymptotic slope, they cannot be distinguished by the Bahadurapproach.

We know that C is uniformly more powerful than C∗. The local power of C is thus alsogreater than that of C∗ for all τ > 0. It is also true asymptotically as n→∞, even if thesample is not Gaussian. Indeed, under the local alternatives τ/

√n, and for a regular statistical

model, the statistic n−1/2∑ni=1 Xi is asymptotically N(τ, 1) distributed. The local asymptotic

power of C is thus γ (τ) = 1−�(c − τ) with c = �−1(1− α). The local asymptotic powerof C∗ is γ ∗(τ ) = 1−�(c∗ − τ)+�(−c∗ − τ) , with c∗ = �−1(1− α/2). The differencebetween the two asymptotic powers is

D(τ) = γ (τ)− γ ∗(τ ) = −�(c − τ)+�(c∗ − τ

)−�(−c∗ − τ

)

Page 435: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 419

and, denoting the N(0, 1) density by φ(x), we have

D′(τ ) = φ (c − τ)− φ(c∗ − τ

)+ φ(−c∗ − τ

) = φ (τ) ecτ g(τ ),

whereg(τ) = e−c

2/2 + ec∗2/2

(−e(c∗−c)τ + e−(c

∗+c)τ).

Since 0 < c < c∗, we have

g′(τ ) = ec∗2/2

{−(c∗ − c)e(c

∗−c)τ − (c∗ + c)e−(c∗+c)τ

}< 0.

Thus, g(τ) is decreasing on [0,∞). Note that g(0)> 0 and limτ→+∞ g(τ) = −∞. The signof g(τ), which is also the sign of D′(τ ), is positive when τ ∈ [0, a] and negative whenτ ∈ [a,∞), for some a > 0. The function D thus increases on [0, a] and decreases on [a,∞).Since D(0) = 0 and limτ→+∞D(τ) = 0, we have D(τ)> 0 for all τ > 0. This shows that,in Pitman’s sense, the test C is, as expected, locally more powerful than C∗ in the Gaussiancase, and locally asymptotically more powerful than C∗ in a much more general framework.

8.9 The Wald test uses the fact that

√n(Xn − θ) ∼ N(0, σ 2) and S2

n → σ 2 a.s.

To justify the score test, we remark that the log-likelihood constrained by H0 is

−n2

log σ 2 − 1

2σ 2

n∑i=1

X2i

which gives∑

X2i /n as constrained estimator of σ 2. The derivative of the log-likelihood

satisfies1√n

∂(θ, σ 2)logLn(θ, σ

2) =(n−1/2∑

i Xi

n−1∑

i X2i

, 0

)

at (θ, σ 2) = (0,∑

X2i /n). The first component of this score vector is asymptotically N(0, 1)

distributed under H0. The third test is of course the likelihood ratio test, because the uncon-strained log-likelihood at the optimum is equal to −(n/2) log S2

n − (n/2) whereas the max-imal value of the constrained log-likelihood is −(n/2) log

∑X2i /n− (n/2). Note also that

Ln = n log(1+X2n/S

2n) ∼ Wn under H0.

The asymptotic level of the three tests is of course α, but using the inequality x1+x <

log(1+ x) < x for x > 0, we have

Rn ≤ Ln ≤ Wn,

with almost surely strict inequalities in finite samples, and also asymptotically under H1. Thisleads us to think that the Wald test will reject more often under H1.

Since S2n is invariant by translation of the Xi , S2

n tends almost surely to σ 2 both underH0 and under H1, as well as under the local alternatives Hn(τ) : θ = τ/

√n. The behavior of∑

X2i /n under Hn(τ) is the same as that of

∑(Xi + τ/

√n)2/n under H0, and because

1

n

n∑i=1

(Xi + τ√

n

)2

= 1

n

n∑i=1

X2i +

τ 2

n+ 2

τ√nXn → σ 2 a.s.

under H0, we have∑

X2i /n→ σ 2 both under H0 and under Hn(τ). Similarly, it can be

shown that Xn/Sn → 0 under H0 and under Hn(τ). Using these two results and x/(1+ x) ∼

Page 436: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

420 GARCH MODELS

log(1+ x) in the neighborhood of 0, it can be seen that the statistics Ln, Rn and Wn areequivalent under Hn(τ). Therefore, the Pitman approach cannot distinguish the three tests.

Using logP(χ2r > x) ∼ −x/2 for x in the neighborhood of +∞, the asymptotic Bahadur

slopes of the tests C1, C2 and C3 are respectively

c1(θ) = limn→∞

−2

nlogP(χ2

1 >Wn) = limn→∞

X2n

S2n

= θ2

σ 2

c2(θ) = θ2

σ 2 + θ2, c3(θ) = log

(σ 2 + θ2

σ 2

)under H1,

Clearly

c2(θ) = c1(θ)

1+ c1(θ)< c3(θ) = log {1+ c1(θ)} < c1(θ).

Thus the ranking of the tests, in increasing order of relative efficiency in the Bahadur sense, is

score ≺ likelihood ratio ≺ Wald.

All the foregoing remains valid for a regular non-Gaussian model.

8.10 In Example 8.2, we saw that

λ! = Z − Z−d c, c =

γ1...

γd

, γi = E(ZdZi)

Var(Zd).

Note that Var(Zd)c corresponds to the last column of VarZ = (κη − 1)J−1. Thus c is thelast column of J−1 divided by the (d, d)th element of this matrix. In view of Exercise 6.7,this element is (J22 − J21J

−111 J12)

−1. It follows that J c = (0, . . . , 0, J22 − J21J−111 J12)

′ andc′J c = J22 − J21J

−111 J12. By (8.24), we thus have

L = −1

2(Z−d )

2c′J c+ 1

2Z2d (J22 − J21J

−111 J12)

= 1

2(Z+d )

2(J22 − J21J−111 J12) = κη − 1

2

(Z+d )2

Var(Zd).

This shows that the statistic 2/(κη − 1)Ln has the same asymptotic distribution as the Waldstatistic Wn, that is, the distribution δ0/2+ χ2

1 /2 in the case d2 = 1.

8.11 Using (8.29) and Exercise 8.6, we have

L = −1

2

(

d∑i=2

Z−i

)2

+ κη

d∑i=2

(Z−i)2 − 2

(d∑i=2

Z−i

)2

+d∑

i,j=2i =j

Z−i Z−j − (κη − 1)

d∑i=2

Z2i

= κη − 1

2

d∑i=2

(Z+i)2.

The result then follows from (8.30).

8.12 Since XY = 0 almost surely, we have P(XY = 0) = 0. By independence, we have P(XY =0) = P(X = 0 and Y = 0) = P(X = 0)P (Y = 0). It follows that P(X = 0) = 0 or P(Y =0) = 0.

Page 437: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 421

Chapter 9

9.1 Substituting y = x/σt , and then integrating by parts, we obtain∫f (y)dy +

∫yf ′(y)dy = 1+ lim

a,b→∞[yf (y)]a−b −

∫f (y)dy = 0.

Since σ 2t and ∂σ 2

t /∂θ belong to the σ -field Ft−1 generated by {εu : u < t}, and since thedistribution of εt given Ft−1 has the density σ−1

t f (·/σt ), we have

E

{∂

∂θlogLn,f (θ0)

∣∣∣∣Ft−1

}= 0,

and the result follows. We can also appeal to the general result that a score vector is centered.

9.2 It suffices to use integration by parts.

9.3 We have

!(θ, θ0, x) = − (x − θ)2

2σ 2+ (x − θ0)

2

2σ 2= θ2

0 − θ2

2σ 2+ x

(θ − θ0)

σ 2.

Thus (aX + b

!(θ, θ0, X)

)∼ N

{(aθ0 + b

− (θ−θ0)2

2σ 2

),

(a2σ 2 a(θ − θ0)

a(θ − θ0)(θ−θ0)

2

σ 2

)}

when X ∼ N(θ0, σ2), and(

aX + b

!(θ, θ0, X)

)∼ N

{(aθ + b

(θ0−θ)(θ0−3θ)2σ 2

),

(a2σ 2 a(θ − θ0)

a(θ − θ0)(θ−θ0)

2

σ 2

)}

when X ∼ N(θ, σ 2). Note that

Eθ(aX + b) = Eθ0 (aX + b)+ Covθ0 {aX + b,!(θ, θ0, X)} ,

as in Le Cam’s third lemma.

9.4 Recall that∂

∂θlogLn,fλ(ϑ) = −

n∑t=1

λ

(1−

∣∣∣∣ εtσt∣∣∣∣λ)

1

2σ 2t

∂σ 2t

∂θ

and∂

∂λlogLn,fλ(ϑ) =

n∑t=1

{1

λ+(

1−∣∣∣∣ εtσt∣∣∣∣λ)

log

∣∣∣∣ εtσt∣∣∣∣}.

Using the ergodic theorem, the fact that(1− |ηt |λ

)is centered and independent of the past,

as well as elementary calculations of derivatives and integrals, we obtain

−n−1 ∂2

∂θ∂θ ′logLn,fλ(ϑ0) = n−1

n∑t=1

λ20 |ηt |λ

1

4σ 4t

∂2σ 2t

∂θ∂θ ′(θ0)+ o(1)→ J11,

−n−1 ∂2

∂λ∂θ ′logLn,fλ(ϑ0) = −n−1

n∑t=1

λ0 |ηt |λ log |ηt | 1

2σ 2t

∂σ 2t

∂θ(θ0)+ o(1)→ J12,

Page 438: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

422 GARCH MODELS

and

−n−1 ∂2

∂λ2logLn,fλ(ϑ0) = n−1

n∑t=1

{1

λ20

+ |ηt |λ (log |ηt |)2

}→ J22

almost surely.

9.5 Jensen’s inequality entails that

E log σf (ησ)− E log f (η) = E logσf (ησ)

f (η)

≤ logEσf (ησ)

f (η)= log

∫σf (yσ)dy = 0,

where the inequality is strict if σf (ησ)/f (η) is nonconstant. If this ratio of densities werealmost surely constant, it would be almost surely equal to 1, and we would have

E|η|r =∫|x|rf (x)dx =

∫|x|rσf (xσ)dx = σ−r

∫|x|rf (x)dx = E|η|r/σ r ,

which is possible only when σ = 1.

9.6 It suffices to note that τ 2 ,fb

= τ 2fb,fb

(= Efbη2t − 1).

9.7 The second-order moment of the double �(b, p) distribution is p(p + 1)/b2. Therefore,to have Eη2

t = 1, the density f of ηt must be the double �(√

p(p + 1), p). We then

obtain

1+ f ′(x)f (x)

x = p −√p(p + 1)|x|.

Thus If = p and τ 2f,f = 1/p. We then show that κη := ∫ x4fp(x)dx = (3+ p)(2+

p)/p(p + 1). It follows that τ 2φ,f /τ

2f,f = (3+ 2p)/(2+ 2p).

To compare the ML and Laplace QML, it is necessary to normalize in such away that E|ηt | = 1, that is, to take the double � (p, p) as density f . We then obtain1+ xf ′(x)/f (x) = p − p|x|. We always have If = τ−2

f,f = p2(Eη2t − 1) = p, and we have

τ 2 ,f = (Eη2

t − 1) = 1/p. It follows that τ 2 ,f /τ

2f,f = 1, which was already known from

Exercise 9.6. This allows us to construct a table similar to Table 9.5.

9.8 Consider the first instrumental density of the table, namely

h(x) = c|x|λ−1 exp(−λ|x|r /r) , λ> 0.

Denoting by c any constant whose value can be ignored, we have

g(x, σ ) = log σ + (λ− 1) log σ |x| − λσ r|x|rr+ c,

g1(x, σ ) = λ

σ− λσr−1|x|r ,

g2(x, σ ) = − λ

σ 2− λ(r − 1)σ r−2|x|r ,

Page 439: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 423

and thus

τ 2h,f =

E(1− |ηt |r )2

r2= E|ηt |2r − 1

r2.

Now consider the second density,

h(x) = c|x|−λ−1 exp(−λ|x|−r/r) .

We have

g(x, σ ) = log σ + (−λ− 1) log σ |x| − λσ−r|x|−rr

+ c,

g1(x, σ ) = −λσ+ λσ−r−1|x|−r ,

g2(x, σ ) = λ

σ 2− λ(r + 1)σ−r−2|x|−r ,

which gives

τ 2h,f =

E(1− |ηt |−r )2r2

.

Consider the last instrumental density,

h(x) = c|x|−1 exp{−λ(log |x|)2} .

We have

g(x, σ ) = log σ − log σ |x| − λ log2(σ |x|)+ c,

g1(x, σ ) = −2λ

σlog(σ |x|),

g2(x, σ ) = 2λ

σ 2log(σ |x|)− 2

λ

σ 2,

and thusτ 2h,f = E(log |ηt |)2.

In each case, τ 2h,f does not depend on the parameter λ of h. We conclude that the estimators

θn,h exhibit the same asymptotic behavior, regardless of the parameter λ. It can even be easilyshown that the estimators themselves do not depend on λ.

9.9 .1. The Laplace QML estimator applied to a GARCH in standard form (as defined inExample 9.4) is an example of such an estimator.

2. We have

σ 2t (θ

∗) = $2ω0 +q∑i=1

$2α0iε2t−i +

p∑j=1

β0jσ2t−j (θ

∗)

= $2

1−p∑j=1

β0jBj

−1 (ω0 +

q∑i=1

α0iε2t−i

)= $2σ 2

t (θ0).

Page 440: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

424 GARCH MODELS

3. Since

∂σ 2t (θ)

∂θ=1−

p∑j=1

βjBj

−1

1ε2t−1...

ε2t−q

σ 2t−1(θ)

...

σ 2t−p(θ)

,

we have

∂σ 2t (θ

∗)∂θ

= $2!$

∂σ 2t (θ0)

∂θ,

1

σ 2t (θ

∗)∂σ 2

t (θ∗)

∂θ= !$

1

σ 2t (θ0)

∂σ 2t (θ0)

∂θ.

It follows that, using obvious notation,

J (θ∗) = !$J (θ0)!$ =(

$−4J11(θ0) $−2J12(θ0)

$−2J21(θ0) J22(θ0)

).

9.10 After reparameterization, the result (9.27) applies with ηt replaced by η∗t = ηt/$, and θ0 by

θ∗ = ($2ω0, $2α01, . . . , $

2α0q, β01, . . . , β0p)′,

where $ = ∫ |x|f (x)dx. Thus, using Exercise 9.9, we obtain

√n(θn, −!−1

$ θ0

) L→ N(

0, 4τ 2 ,f !

−1$ J−1!−1

$

)with τ 2

,f = Eη∗t2 − 1 = Eη2

t /$2 − 1.

Chapter 10

10.1 Note that σt is a measurable function of η2t−1, . . . , η

2t−h and of ε2

t−h−1, ε2t−h−2, . . . , that will

be denoted byh(η2

t−1, . . . , η2t−h, ε

2t−h−1, ε

2t−h−2, . . . ).

Using the independence between ηt−h and the other variables of h, we have, for all h,

Cov(σt , εt−h) = E {E (σtεt−h | ηt−1, . . . , ηt−h+1, εt−h−1, εt−h−2, . . . )}

= E

{∫h(η2

t−1, . . . , η2t−h+1, x

2, ε2t−h−1, ε

2t−h−2, . . . )xσt−hdPη(x)

}= 0

when the distribution Pη is symmetric.

10.2 A sequence Xi of independent real random variables such that Xi = 0 with probabil-ity 1/(i + 1) and Xi = (i + 1)/i with probability 1− 1/(i + 1) is suitable, because Y :=∏∞

i=1 Xi = 0 a.s., EY = 0, EXi = 1 and∏∞

i=1 EXi = 1. We have used P(limn ↓ An) =limn ↓ P(An) for any decreasing sequence of events, in order to show that

P(Y = 0) =∞∏i=1

P(Xi = 0) = exp

{ ∞∑i=1

log(1− 1/i)

}= 0.

Page 441: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 425

10.3 By definition,∞∏i=1

E exp{|λig(ηt )|} = limn→∞

n∏i=1

E exp{|λig(ηt )|}

and, by continuity of the exponential,

limn→∞

n∏i=1

E exp{|λig(ηt )|} = exp

{limn→∞

n∑i=1

logE exp{|λig(ηt )|}}

is finite if and only if the series of general term logE exp{|λig(ηt )|} converges. Using theinequalities exp(EX) ≤ E exp(X) ≤ E exp(|X|), we obtain

λiEg(ηt ) ≤ log gη (λi) ≤ logE exp{|λig(ηt )|}.

Since the λi tend to 0 at an exponential rate and E|g(ηt )| <∞, the series of general termλiEg(ηt ) converges absolutely, and we finally obtain

∞∑i=1

∣∣log gη (λi)∣∣ ≤ ∞∑

i=1

|λiEg(ηt )| +∞∑i=1

logE exp{|λig(ηt )|},

which is finite under condition (10.6).

10.4 Note that (10.7) entails that

ε2t = eω

∗η2t limn→∞

n∏i=1

exp{λig(ηt−i )} = eω∗η2t exp

{limn→∞

n∑i=1

λig(ηt−i )

}

with probability 1. The integral of a positive measurable function being always defined inR+ ∪ {+∞}, using Beppo Levi’s theorem and then the independence of the ηt , we obtain

Eε2t ≤ Eeω

∗η2t exp

{limn→∞ ↑

n∑i=1

|λig(ηt−i )|}

= eω∗

limn→∞ ↑

n∏i=1

E exp {|λig(ηt−i )|} ,

which is of course finite under condition (10.6). Applying the dominated convergencetheorem, and bounding the variables exp

{∑ni=1 λig(ηt−i )

}by the integrable variable

exp{∑∞

i=1 |λig(ηt−i )|}, we then obtain the desired expression for Eε2

t .

10.5 Denoting by φ the density of η ∼ N(0, 1),

Eeλ|η| = 2∫ ∞

0eλxφ(x)dx = 2

∫ ∞

0eλ

2/2φ(x − λ)dx = 2eλ2/2�(λ)

and E|η| = √2/π . With the notation τ = |θ | + |ς |, it follows that

Ee|λgη(η)| ≤ e|λς |E|η|Ee|λ|τ |η| = exp

(|λς |

√2

π+ λ2τ 2

2

)2�(|λ|τ).

It then suffices to use the fact that �(x) is equivalent to 1/2+ xφ(0), and that log 2�(x)is thus equivalent to 2xφ(0), in a neighborhood of 0.

Page 442: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

426 GARCH MODELS

10.6 We can always assume ς = 1. In view of the discussion on page 249, the process Xt = log ε2t

satisfies an ARMA(1, 1) representation of the form

Xt − β1Xt−1 = ω + ut + but−1,

where ut is a white noise with variance σ 2u . Using Var

(log η2

t

) = π2/2 and

Var {g (ηt )} = 1− 2

π+ θ2, Cov

{g(ηt ), log η2

t

} = log 16√2π

,

the coefficients |b| < 1 and σ 2u are such that

σ 2u (1+ b2) = Var

{log η2

t + α1g(ηt−1)− β1 log η2t−1

}= π2

(1+ β1

2)

2+ α2

1

(θ 2 + 1− 2

π

)− 2α1β1

log 16√2π

and

bσ 2u = α1Cov

{g(ηt ), log η2

t

}− β1Var{log η2

t

} = α1log 16√

2π− π2 β1

2.

When, for instance, ω = 1, β1 = 1/2, α1 = 1/2 and θ = −1/2, we obtain

Xt − 0.5Xt−1 = 1+ ut − 0.379685ut−1, Var(ut ) = 0.726849.

10.7 In view of Exercise 3.5, an AR(1) process Xt = aXt−1 + ηt , |a| < 1, in which the noiseηt has a strictly positive density over R, is geometrically β-mixing. Under the station-arity conditions given in Theorem 10.1, if ηt has a density f > 0 and if g (defined in(10.4)) is a continuously differentiable bijection (that is, if θ = 0) then

(log σ 2

t

)is a geo-

metrically β-mixing stationary process. Reasoning as in step (iv) of the proof of Theorem3.4, it is shown that (ηt , σ 2

t ), and then (εt ), are also geometrically β-mixing stationaryprocesses.

10.8 Since ε+t = σtη+t and ε−t = σtη

−t , we have

σt = ω + a(ηt )σt−1, a(ηt ) = α1,+η+t − α1,−η−t + β1.

If the volatility σt is a positive function of {ηu, u < t} that possesses a moment of order 2,then {

1− Ea2(ηt )}Eσ 2

t = ω2 + 2ωEσtEa(ηt )> 0

under conditions (10.10). Thus, condition (10.14) is necessarily satisfied. Conversely, under(10.14) the strict stationarity condition is satisfied because

E log a(ηt ) = 1

2E log a2(ηt ) ≤ 1

2logEa2(ηt ) < 0,

and, as in the proof of Theorem 2.2, it is shown that the strictly stationary solution possessesa moment of order 2.

Page 443: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 427

10.9 Assume the second-order stationarity condition (10.15). Let

a(ηt ) = α1,+η+t − α1,−η−t + β1,

µ|η| = E|ηt | =√

2/π,

µη+ = Eη+t = −Eη−t = 1/√

2π,

µσ, |ε|(h) = E (σt |εt−h|) ,µ|ε| = E|εt | = µ|η|µσ

and

µσ = Eσt = ω

1− 1√2π(α1,+ + α1,−)− β1

,

µa = Ea(ηt ) = 1√2π

(α1,+ + α1,−)+ β1,

µa2 = 1

2(α2

1,+ + α21,−)+

2β1√2π

(α1,+ + α1,−)+ β21 ,

µσ 2 = ω2 + 2ωµaµσ

1− µa2.

Using σt = ω + a(ηt−1)σt−1, we obtain

µσ, |ε|(h) = ωµ|ε| + µaµσ, |ε|(h− 1), ∀h ≥ 2,

µσ, |ε|(1) = ωµ|η|µσ +{

1

2(α1,+ + α1,−)+

√2

πβ1

}µσ2 ,

µσ, |ε|(0) =√

2

πµσ 2 .

We then obtain the autocovariances

γ|ε|(h) := Cov (|εt |, |εt−h|) = µ|η|µσ, |ε|(h)− µ2|ε|

and the autocorrelations ρ|ε|(h) = γ|ε|(h)/γ|ε|(0). Note that γ|ε|(h) = µaγ|ε|(h− 1) for allh> 1, which shows that (|εt |) is a weak ARMA(1, 1) process. In the standard GARCHcase, the calculation of these autocorrelations would be much more complicated because σtis not a linear function of σt−1.

10.10 This is obvious because an APARCH(1, 1) with δ = 1, α1,+ = α1(1− ς1) and α1,− =α1(1+ ς1) corresponds to a TGARCH(1, 1).

10.11 This is an EGARCH(1, 0) with ς = 1, α1(θ + 1) = α+, α1(θ − 1) = −α− andω − α1E|ηt | = α0. It is natural to impose α+ ≥ 0 and α− ≥ 0, so that the volatilityincreases with |ηt−1|. It is also natural to impose α−>α+ so that the effect of a negativeshock is more important than the effect of a positive shock of the same magnitude. Therealways exists a strictly stationary solution

εt = ηt exp(α0){exp(α+ηt−1) 1l{ηt−1≥0} + exp(−α−ηt−1) 1l{ηt−1<0}

},

and this solution possesses a moment of order 2 when

E exp(α+ηt ) 1l{ηt > 0} <∞ and E exp(−α−ηt ) 1l{ηt<0} <∞,

Page 444: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

428 GARCH MODELS

which is the case, in particular, for ηt ∼ N(0, 1). In the Gaussian case, we have

Eσt = eα0{eα

2+/2�(α+)+ eα2−/2�(α−)

},

Eε2t = Eσ 2

t = e2α0{e2α2+�(2α+)+ e2α2−�(2α−)

}and

Cov(σt , εt−1) = Eσtηt−1Eσt−1

= eα0[eα

2+/2 {φ(α+)+ α+�(α+)} − eα2−/2 {φ(α−)+ α−�(α−)}

]Eσt ,

using the calculations of Exercise 10.5 and∫ ∞

0xeλxφ(x)dx = eλ

2/2∫ ∞

−λ(y + λ)φ(y)dy = eλ

2/2 {φ(λ)+ λ�(λ)} .

Since x �→ φ(x)+ x�(x) is an increasing function, provided α−>α+, we observe theleverage effect Cov(σt , εt−1) < 0.

Chapter 11

11.1 The number of parameters of the diagonal GARCH(p, q) model is

m(m+ 1)

2(1+ p + q),

that of the vectorial model is

m(m+ 1)

2

{1+ (p + q)

m(m+ 1)

2

},

that of the CCC model is

m{1+ (p + q)m2}+ m(m− 1)

2,

that of the BEKK model is

m(m+ 1)

2+Km2(p + q).

For p = q = 1 and K = 1 we obtain Table C.9.

11.2 Assume (11.83) and define U(z) = Bθ (z)B−1θ0(z). We have (11.84) because

U(z)Aθ0 (z) = Bθ (z)B−1θ0(z)Aθ0 (z) = Bθ (z)Bθ (z)−1Aθ (z) = Aθ (z)

andU(z)Bθ0 (z) = Bθ (z)B−1

θ0(z)Bθ0 (z) = Bθ (z).

Conversely, it is easy to check that (11.84) implies (11.83).

11.3 .1. Since X and Y are independent, we have

0 = Cov(X, Y ) = Cov(X,X) = Var(X),

which shows that X is constant.

Page 445: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 429

Table C.9 Number of parameters as a function of m.

Model m = 2 m = 3 m = 5 m = 10

Diagonal 9 18 45 165Vectorial 21 78 465 6105CCC 19 60 265 2055BEKK 11 24 96 186

2. We have

P(X = x1)P (X = x2) = P(X = x1)P (Y = x2) = P(X = x1, Y = x2)

= P(X = x1, X = x2)

which is nonzero only if x1 = x2, thus X and Y take only one value.

3. Assume that there exist two events A and B such that P(X ∈ A)P (X ∈ B)> 0 andA ∩ B = ∅. The independence then entails that

P(X ∈ A)P (X ∈ B) = P(X ∈ A)P (Y ∈ B) = P(X ∈ A,X ∈ B),and we obtain a contradiction.

11.4 For all x ∈ Rm(m+1)/2, there exists a symmetric matrix X such that x = vech(X), and wehave

D+mDmx = D+mDmvech(X) = D+mvec(X) = vech(X) = x.

11.5 The matrix A′A being symmetric and real, there exist an orthogonal matrix C (C ′C = CC′ =I ) and a diagonal matrix D such that A′A = CDC ′. Thus, denoting by λj the (positive)eigenvalues of A′A, we have

x′A′Ax = x ′CDC′x = y ′Dy =d∑

j=1

λjy2j ,

where y = C ′x has the same norm as x. Assuming, for instance, that λ1 = ρ(A′A), wehave

sup‖x‖≤1

‖Ax‖2 = sup‖y‖≤1

d∑j=1

λjy2j ≤ λ1

d∑j=1

y2j ≤ λ1.

Moreover, this maximum is reached at y = (1, 0, . . . , 0)′.An alternative proof is obtained by noting that ‖A‖2 solves the maximization problem

of the function f (x) = x ′A′Ax under the constraint x′x = 1. Introduce the Lagrangian

L(x, λ) = x′A′Ax − λ(x ′x − 1).

The first-order conditions yield the constraint and

L(x, λ)∂x

= 2A′Ax − 2λx = 0.

This shows that the constrained optimum is located at a normalized eigenvector xi associatedwith an eigenvalue λi of A′A, i = 1, . . . , d . Since f (xi) = x ′iA

′Axi = λix′ixi = λi , we of

course have ‖A‖2 = maxi=1,...,d λi .

11.6 Since all the eigenvalues of the matrix A′A are real and positive, the largest eigenvalue ofthis matrix is less than the sum of all its eigenvalues, that is, of its trace. Using the second

Page 446: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

430 GARCH MODELS

equality of (11.49), the first inequality of (11.50) follows. The second inequality followsfrom the same arguments, and noting that there are d2 eigenvalues. The last inequality usesthe fact that the determinant is the product of the eigenvalues and that each eigenvalue isless than ‖A‖2.

The first inequality of (11.51) is a simple application of the Cauchy–Schwarz inequality.The second inequality of (11.51) is obtained by twice applying the second inequality of(11.50).

11.7 For the positivity of Ht for all t > 0 it suffices to require � to be symmetric positivedefinite, and the initial values H0, . . . , H1−p to be symmetric positive semi-definite. Indeed,if the Ht−j are symmetric and positive semi-definite then Ht is symmetric if and only � issymmetric, and we have, for all λ ∈ Rm,

λ′Htλ = λ′�λ+q∑i=1

αi{λ′εt−i

}2 +p∑j=1

βjλ′Ht−j λ ≥ λ′�λ.

We now give a second-order stationarity condition. If H := E(εtε′t ) exists, then this

matrix is symmetric positive semi-definite and satisfies

H = �+q∑i=1

αiH +p∑j=1

βjH,

that is, 1−q∑i=1

αi −p∑j=1

βj

H = �.

If � is positive definite, it is then necessary to have

q∑i=1

αi +p∑j=1

βj < 1. (C.17)

For the reverse we use Theorem 11.5. Since the matrices C(i) are of the form ciIs withci = αi + βi , the condition ρ(

∑ri=1 C

(i)) < 1 is equivalent to (C.17). This condition is thussufficient to obtain the stationarity, under technical condition (ii) of Theorem 11.5 (whichcan perhaps be relaxed). Let us also mention that, by analogy with the univariate case, it iscertainly possible to obtain the strict stationarity under a condition weaker than (C.17) but,to out knowledge, this remains an open problem.

11.8 For the convergence in Lp , it suffices to show that (un) is a Cauchy sequence:

‖un − um‖p ≤ ‖un+1 − un‖p + ‖un+2 − un+1‖p + · · · + ‖um − um−1‖p≤ (m− n)C1/pρn/p → 0

when m>n→∞. To show the almost sure convergence, let us begin by noting that, usingHolder’s inequality,

E |un − un−1| ≤{E |un − un−1|p

}1/p ≤ C∗ρ∗n,

Page 447: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 431

with C∗ = C1/p and ρ∗ = ρ1/p. Let v1 = u1, vn = un − un−1 for n ≥ 2, and v =∑n |vn|which is defined in R ∪ {+∞}, a priori . Since

Ev ≤ C∗∞∑n=1

ρ∗n <∞,

it follows that v is almost surely defined in R+ and u =∑∞n=1 vn is almost surely defined

in R. Since un = v1 + · · · + vn, we have u = limn→∞ un almost surely.

11.9 It suffices to note that pR + (1− p)Q is the correlation matrix of a vector of the form√pX +√1− pY , where X and Y are independent vectors of the respective correlation

matrices R and Q.

11.10 Since the βj are linearly independent, there exist vectors αk such that {α1, . . . ,αm} formsa basis of Rm and such that α′kβj = 1l{k=j} for all j = 1, . . . , r and all k = 1, . . . , m.3 Wethen have

λ∗j t := Var(α′j εt | εu, u < t

) = α′jHtαj = α′j�αj + λjt ,

and it suffices to take

�∗ = �−r∑

j=1

(α′j�αj )βjβ′j .

The conditional covariance between the factors α′j εt and α′kεt , for j = k, is

α′jHtαk = α′j�αk,

which is a nonzero constant in general.

11.11 As in the proof of Exercise 11.10, define vectors αj such that

α′jHtαj = α′j�αj + λjt .

Denoting by ej the j th vector of the canonical basis of Rm, we have

Ht = �+r∑

j=1

βj

{ωj + aje′j εt−1ε

′t−1ej + bj

(α′jHt−1αj − α′j�αj

)}β ′j ,

and we obtain the BEKK representation with K = r ,

�∗ = �+r∑

j=1

{ωj − bjα

′j�αj

}βjβ

′j , Ak = √akβke′k, Bk =

√bkβkα

′k.

11.12 Consider the Lagrange multiplier λ1 and the Lagrangian u′1u1 − λ1(u′1u1 − 1). The first-

order conditions yield2u1 − 2λ1u1 = 0,

which shows that u1 is an eigenvector associated with an eigenvalue λ1 of . Left-multiplying the previous equation by u′1, we obtain

λ1 = λ1u′1u1 = u′1u1 = VarC1,

3 The β1, . . . ,βr can be extended to obtain a basis of Rm. Let B be the m×m matrix of these vectors inthe canonical basis. This matrix B is necessarily invertible. We can thus take the lines of B−1 as vectors αk .

Page 448: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

432 GARCH MODELS

which shows that λ1 must be the largest eigenvalue of . The vector u1 is unique, up to itssign, provided that the largest eigenvalue has multiplicity order 1.

An alternative way to obtain the result is based on the spectral decomposition of thesymmetric definite positive matrices

= P!P ′, PP ′ = Im, ! = diag(λ1, . . . , λm), 0 ≤ λm ≤ · · · ≤ λ1.

Let v1 = Pu1, that is, u1 = P ′v1. Maximizing u′1u1 is equivalent to maximizing v′1!v1.The constraint u′1u1 = 1 is equivalent to the constraint v′1v1 = 1. Denoting by vi1 the com-ponents of v1, the function v′1!v1 =

∑mi=1 v

2i1λi is maximized at v1 = (±1, 0, . . . , 0) under

the constraint, which shows that u1 is the first column of P , up to the sign. We also seethat other solutions exist when λ1 = λ2. It is now clear that the vector P ′X contains the mprincipal components of the variance matrix !.

11.13 All the elements of the matrices D+m Aik and Dm are positive. Consequently, when Aik isdiagonal, using Exercise 11.4, we obtain

0 ≤ D+m(Aik ⊗ Aik)Dm ≤ supj∈{1,...,m}

A2ik(j, j)Im(m+1)/2

element by element. This shows that D+m(Aik ⊗ Aik)Dm is diagonal, and the conclusioneasily follows.

11.14 With the abuse of notation B − λImp = C(B1, . . . ,Bp), the property yields

det(B − λImp

) = det(−λIm)det

{C (B1, . . . ,Bp−1

)+ 1

λ

(Bp

0

)(0 Im)

}= det(−λIm)det

{C(

B1, . . . ,Bp−2,Bp−1 + 1

λBp

)}.

The proof is completed by induction on p.

11.15 Let A1/2 be the symmetric positive definite matrix defined by A1/2A1/2 = A. If X is aneigenvector associated with the eigenvalue λ of the symmetric positive definite matrixS = A1/2BA1/2, then we have ABA1/2X = λA1/2X, which shows that the eigenvaluesof S and AB are the same. Write the spectral decomposition as S = P!P ′ where !

is diagonal and P ′P = Im. We have AB = A1/2SA−1/2 = A1/2P!P ′A−1/2 = Q!Q−1,with Q = A1/2P .

11.16 Let c = (c′1, c′2)′ be a nonzero vector such that

c′(A+ B)c = (c′1A11c1 + 2c′1A12c2 + c′2A22c2)+ c′1B11c1.

On the right-hand side of the equality, the term in parentheses is nonnegative and the lastterm is positive, unless c1 = 0. But in this case c2 = 0 and the term in parentheses becomesc′2A22c2 > 0.

11.17 Take the random matrix A = XX′, where X ∼ N(0, Ip) with p> 1. Obviously A is neverpositive definite because this matrix always possesses the eigenvalue 0 but, for all c = 0,c′Ac = (c′X)2 > 0 with probability 1.

Page 449: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 433

Chapter 12

12.1 For ω = 0, we obtain the geometric Brownian motion whose solution, in view of (12.13),is equal to

X0t = x0 exp

{(µ− σ 2/2)t + σWt

}.

By Ito’s formula, the SDE satisfied by 1/X0t is

d

(1

X0t

)= 1

X0t

{(−µ+ σ 2)dt − σdWt }.

Using the hint, we then have

dYt = Xtd

(1

X0t

)+ 1

X0t

dXt + σ 2 Xt

X0t

dt

= ω

X0t

dt.

It follows that

Xt = X0t

(1+ ω

∫ t

0

1

X0s

ds

).

The positivity follows.

12.2 It suffices to check the conditions of Theorem 12.1, with the Markov chain (Xkτ ).We have

τ−1E(X(τ)kτ −X

(τ)(k−1)τ | X(τ)

(k−1)τ = x) = µ(x),

τ−1Var(X(τ)kτ −X

(τ)

(k−1)τ | X(τ)

(k−1)τ = x) = σ 2(x),

τ−2+δ

2 E

(∣∣∣X(τ)kτ −X

(τ)

(k−1)τ

∣∣∣2+δ | X(τ)kτ = x

)= E

(∣∣µ(x)√τ + σ(x)ε(k+1)τ∣∣2+δ)

<∞,

this inequality being uniform on any ball of radius r . The assumptions of Theorem 12.1 arethus satisfied.

12.3 One may take, for instance,

ωτ = ωτ, ατ = τ, βτ = 1− (1+ δ)τ.

It is then easy to check that the limits in (12.23) and (12.25) are null. The limiting diffusionis thus {

dXt = f (σt )dt + σtdW1t

dσ 2t = (ω − δσ 2

t )dt

The solution of the σ 2t equation is, using Exercise 12.1 with σ = 0,

σ 2t = e−δt

(σ 2

0 − ω/δ)+ ω/δ, t ≥ 0,

where σ 20 is the initial value. It is assumed that σ 2

0 >ω/δ and δ > 0, in order to guaranteethe positivity. We have

limt→∞ σ 2

t = ω/δ.

Page 450: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

434 GARCH MODELS

12.4 In view of (12.34) the put price is P(S, t) = e−r(T−t)Eπ [(K − ST )+ | St ]. We have seen that

the discounted price is a martingale for the risk-neutral probability. Thus e−r(T−t)Eπ [ST |St ] = St . Moreover,

(ST −K)+ − (K − ST )+ = ST −K.

The result is obtained by multiplying this equality by e−r(T−t) and taking the expectationwith respect to the probability π .

12.5 A simple calculation shows that ∂C(S,t)∂St

= �(xt + σ√τ) ∈ (0, 1).

12.6 In view of (12.36), Ito’s formula applied to Ct = C(S, t) yields

dCt =(∂Ct

∂t+ ∂Ct

∂StµSt + 1

2(σSt )

2 ∂2Ct

∂S2t

)dt + ∂Ct

∂StσStdWt := µtCtdt + σtCtdWt ,

with, in particular, σt = 1Ct

∂Ct∂St

σSt . In view of Exercise 12.5, we thus have

σt

σ= St�(xt + σ

√τ)

Ct

= St�(xt + σ√τ)

St�(xt + σ√τ)− e−rτK�(xt )

> 1.

12.7 Given observations S1, . . . , Sn of model (12.31), and an initial value S0, the maximumlikelihood estimators of m = µ− σ 2/2 and σ 2 are, in view of (12.33),

m = 1

n

n∑i=1

log(Si/Si−1), σ 2 = 1

n

n∑i=1

{log(Si/Si−1)− m}2.

The maximum likelihood estimator of µ is then

µ = m+ σ 2

2.

12.8 Denoting by φ the density of the standard normal distribution, we have

∂C(S, t)

∂σ= Stφ(xt + σ

√τ)

(∂xt

∂σ+√τ

)− e−rτKφ(xt)

∂xt

∂σ.

It is easy to verify that Stφ(xt + σ√τ) = e−rτKφ(xt). It follows that

∂C(S, t)

∂σ= St

√τφ(xt + σ

√τ)> 0.

The option buyer wishes to be covered against the risk: he thus agrees to pay more if theasset is more risky.

12.9 We have St = St−1er−σ2

t /2+σt η∗t where (η∗t )iid∼ N(0, 1). It follows that

E(St | It−1) = St−1 exp

(r − σ 2

t

2+ σ 2

t

2

)= erSt−1.

The property immediately follows.

Page 451: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 435

12.10 The volatility is of the form σ 2t = ω + a(ηt−1)σ

2t−1 with a(x) = ω + {α(x − γ )2 + β}σ 2

t−1.

Using the results of Chapter 2, the strict and second-order stationarity conditions are

E log a(ηt ) < 0 and Ea(ηt ) < 1.

We are in the framework of model (12.44), with µt = r + λσt − σ 2t /2. Thus the risk-neutral

model is given by (12.47), with η∗t = λ+ ηt and

σ 2t = ω + {α(η∗t−1 − λ− γ )2 + β}σ 2

t−1.

12.11 The constraints (12.41) can be written as

e−r = E exp(at + btηt+1 + ctη2t+1 | It ),

1 = E exp{at + btηt+1 + ctη2t+1 + Zt+1 | It }

= E exp{at + µt+1 + σt+1ηt+1 + btηt+1 + ctη2t+1 | It }.

It can easily be seen that if U ∼ N(0, 1) we have, for a < 1/2 and for all b,

E[exp{a(U + b)2}] = 1√1− 2a

exp

(ab2

1− 2a

).

Writing

btηt+1 + ctη2t+1 = ct

{ηt+1 + bt

2ct

}2

− b2t

4ct,

we thus obtain

1 = 1√1− 2ct

exp

(at + r + b2

t

2(1− 2ct )

), ct < 1/2;

and writing

σt+1ηt+1 + btη2t+1 = bt

{ηt+1 + σt+1

2bt

}2

− σ 2t+1

4bt,

we have

1 = 1√1− 2ct

exp

(at + µt+1 + (bt + σt+1)

2

2(1− 2ct )

).

It follows that(2bt + σt+1)σt+1

2(1− 2ct )= r − µt+1 =

σ 2t+1

2− λσt+1.

Thus2bt + σt+1

2(1− 2ct )= σt+1

2− λ.

There are an infinite number of possible choices for bt and ct . For instance, if λ = 0, one cantake bt = νσt+1 and ct = −ν with ν >−1/2. Then at follows. The risk-neutral probabilityπt,t+1 is obtained by calculating

Eπt,t+1(euZt+1 | It ) = E(eat+r+uµt+1+(bt+uσt+1)ηt+1+ct η2

t+1 | It)

= exp

{u

(r − σ 2

t+1

2(1− 2ct )

)+ u2 σ 2

t+1

2(1− 2ct )

}.

Page 452: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

436 GARCH MODELS

Under the risk-neutral probability, we thus have the model log (St /St−1) = r − σ 2t

2(1−2ct−1)+ ε∗t ,

ε∗t = σt√1−2ct−1

η∗t , (η∗t )iid∼ N(0, 1).

(C.18)

Note that the volatilities of the two models (under historical and risk-neutral probability) donot coincide unless ct = 0 for all t .

12.12 We have VaRt,1(α) = − log(2α)/λ. It can be shown that the distribution of Lt,t+2 has thedensity g(x) = 0.25λ exp{−λ|x|}(1+ λ|x|). At horizon 2, the VaR is thus the solution u ofthe equation (2+ λu) exp{−λu} = 4α. For instance, for λ = 0.1 we obtain VaRt,2(0.01) =51.92, whereas

√2VaRt,1(0.01) = 55.32. The VaR is thus underevaluated when the incorrect

rule is applied, but for other values of α the VaR may be overevaluated: VaRt,2(0.05) =32.72, whereas

√2VaRt,1(0.05) = 32.56.

12.13 We have

�Pt+i −m = Ai(�Pt −m)+ Ut+i + AUt+i−1 + · · · + Ai−1Ut+1.

Thus, introducing the notation Ai = (I − Ai)(I − A)−1,

Lt,t+h = −a′h∑i=1

m+ Ai(�Pt −m)+i∑

j=1

Ai−jUt+j

= −a′mh− a′AAh(�Pt −m)− a′

h∑j=1

h∑i=j

Ai−j

Ut+j

= −a′mh− a′AAh(�Pt −m)− a′h∑

j=1

Ah−j+1Ut+j .

The conditional law of Lt,t+h is thus the N(a′µt,h, a′ha) distribution, and (12.58) follows.

12.14 We have

�Pt+2 =√ω + α1�P

2t+1Ut+2 =

√ω + α1(ω + α1�P

2t )U

2t+1Ut+2.

At horizon 2, the conditional distribution of �Pt+2 is not Gaussian if α1 > 0, because itskurtosis coefficient is equal to

Et�P4t+2

(Et�P2t+2)

2= 3

(1+ 2θ2

t

(ω + θt )2

)> 3, θt = α1(ω + α1�P

2t ).

There is no explicit formula for Fh when h> 1.

12.15 It suffices to note that, conditionally on the available information It , we have

α = P

(pt+h − pt

pt<−VaRt (h, α)

pt| It)

= P

{rt+1 + · · · + rt+h < log

(1− VaRt (h, α)

pt

)| It}.

Page 453: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

SOLUTIONS TO THE EXERCISES 437

12.16 For simplicity, in this proof we will omit the indices. Since Lt,t+h has the same distributionas F−1(U), where U denotes a variable uniformly distributed on [0, 1], we have

E[Lt,t+h 1lLt,t+h >VaR(α)] = E[F−1(U) 1lF−1(U)>F−1(1−α)]

= E[F−1(U) 1lU > 1−α]

=∫ 1

1−αF−1(u)du =

∫ α

0F−1(1− u)du

=∫ α

0VaRt,h(u)du.

Using (12.63), the desired equality follows.

12.17 The monotonicity, homogeneity and invariance properties follow from (12.62) and from theVaR properties. For L3 = L1 + L2 we have

α{ES1(α)+ ES2(α)− ES3(α)}= E[L1(1lL1≥VaR1(α)− 1lL3≥VaR3(α))]+ E[L2(1lL2≥VaR1(α)− 1lL3≥VaR3(α))].

Note that(L1 − VaR1(α))(1lL1≥VaR1(α)− 1lL3≥VaR3(α)) ≥ 0

because the two bracketed terms have the same sign. It follows that

α{ES1(α)+ ES2(α)− ES3(α)} ≥ VaR1(α))E[1lL1≥VaR1(α)− 1lL3≥VaR3(α)]

+ VaR2(α))E[1lL2≥VaR2(α)− 1lL3≥VaR3(α)]

= 0.

The property is thus shown.

12.18 The volatility equation is

σ 2t = a(ηt−1)σ

2t−1, where a(x) = λ+ (1− λ)x2.

It follows thatσ 2t = a(ηt−1) . . . a(η0)σ

20 .

We have E log a(ηt ) < logEa(ηt ) = 0, the inequality being strict because the distributionof a(ηt ) is nondegenerate. In view of Theorem 2.1, this implies that a(ηt−1) . . . a(η0)→ 0a.s., and thus that σ 2

t → 0 a.s., when t tends to infinity.

12.19 Given, for instance, σt+1 = 1, we have rt+1 = ηt+1 ∼ N(0, 1) and the distribution of

rt+2 =√λ+ (1− λ)η2

t+1ηt+2

is not normal. Indeed, Ert+2 = 0 and Var(rt+2) = 1, but E(r4t+2) = 3

{1+ 2(1− λ)2

} = 3.Similarly, the variable

rt+1 + rt+2 = ηt+1 +√λ+ (1− λ)η2

t+1ηt+2

is centered with variance 2, but is not normally distributed because

E(rt+1 + rt+2)4 = 6

{1+ (2− λ)2} = 12.

Note that the distribution is much more leptokurtic when λ is close to 0.

Page 454: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 455: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Appendix D

Problems

Problem 1

The exercises are independent. Let (ηt ) be a sequence of iid random variables satisfying E(ηt ) = 0and Var(ηt ) = 1.

Exercise 1: Consider, for all t ∈ Z, the model{εt = σtηtσt = ω +∑q

i=1 αi |εt−i | +∑p

j=1 βjσt−j ,

where the constants satisfy ω> 0, αi ≥ 0, i = 1, . . . , q and βj ≥ 0, j = 1, . . . , p. We also assumethat ηt is independent of the past values of εt . Let µ = E|ηt |.1. Give a necessary condition for the existence of E|εt |, and give the value of m = E|εt |.2. In this question, assume that p = q = 1.

(a) Establish a sufficient condition for strict stationarity using the representation

σt = ω + a(ηt−1)σt−1,

and give a strictly stationary solution of the model. It will be assumed that this conditionis also necessary.

(b) Establish a necessary and sufficient condition for the existence of a second-order stationarysolution. Compute the variance of this solution.

3. Give a representation of the model which allows the coefficients to be estimated by least squares.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 456: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

440 GARCH MODELS

Exercise 2: The following parametric specifications have been introduced in the ARCH literature.

(i) Quadratic GARCH(p, q):

εt = σtηt , σ 2t =

(ω +

q∑i=1

αiεt−i

)2

+p∑j=1

βjσ2t−j .

(ii) Qualitative ARCH of order 1:

εt = σtηt , σt =I∑i=1

αi 1lAi (εt−1),

where the αi are real coefficients, (Ai, i = 1, . . . , I ) is a partition of R and 1lAi (x) is equalto 1 if x ∈ Ai , and to 0 otherwise.

(iii) Autoregressive stochastic volatility:

εt = σtηt , σt = ω + βσt−1 + σut ,

where the ut are iid variables with mean 0 and variance 1, and are independent of(ηt , t ∈ Z).Briefly discuss the dynamics of the solutions (in terms of trajectories and asymmetries),the constraints on the coefficients and why each of these models is of practical interest.

In the case of model (ii), determine maximum likelihood estimators of the coefficients αi ,assuming that the Ai are known and that ηt is normally distributed.

Exercise 3: Consider the model{εt = σ1t η1t + σ2t η2t

σ 2lt = ωl +

∑q

i=1 αliε2t−i +

∑p

j=1 βlj σ2l,t−j , l = 1, 2,

where (η1t , η2t ) is an iid sequence with values in R2 such that E(η1t ) = E(η2t ) = 0, Var(η1t ) =Var(η2t ) = 1 and Cov(η1t , η2t ) = 0; σ1t and σ2t belong to the σ -field generated by the past of εt ;and ωl > 0, αli ≥ 0 (i = 1, . . . , q), and βlj ≥ 0 (j = 1, . . . , p).

1. We assume in this question that there exists a second-order stationary solution (εt ). Show thatEεt = 0. Under a condition to be specified, compute the variance of εt .

2. Compute E(ε2t |εt−1, εt−2, . . .) and E(ε4

t |εt−1, εt−2, . . .). Do GARCH-type solutions of the formεt = σtηt exist, where σt belongs to the past of εt and ηt is independent of that past?

3. In the case where β1j = β2j for j = 1, . . . , p, show that (εt ) is a weak GARCH process.

Page 457: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 441

Solution

Exercise 1:1. We have

E|εt | = E(σt )|ηt | = µ

ω +q∑i=1

αiE|εt−i| +p∑j=1

βjE(σt−j )

,

and, putting r = max(p, q), m satisfies

m

{1−

r∑i=1

(αiµ+ βi)

}= µω.

The condition is thus∑r

i=1(αiµ+ βi) < 1.

2. .(a) We obtain a(ηt−1) = α1|ηt−1| + β1. Arguing as in the GARCH(1, 1) case, it can be shownthat a sufficient strict stationarity condition is E{log a(ηt )} < 0.

(b) First assume that there exists a second-order stationary solution (εt ). We have E(ε2t ) =

E(σ 2t ). Thus

E(ε2t ) = ω2 + (α2

1 + β21 + 2α1β1µ

)E(ε2

t−1)+ 2ω

(α1 + β1

µ

)E|εt−1|

and, since E(ε2t ) = E(ε2

t−1),

(1− α2

1 − β21 − 2α1β1µ

)E(ε2

t ) = ω2 + 2ω

(α1 + β1

µ

)E|εt−1|.

Thus we necessarily have

α21 + β2

1 + 2α1β1µ < 1.

One can then compute

E|εt | = µω

1− α1µ− β1, E(ε2

t ) =(1+ α1µ+ β1) ω

2

(1− α1µ− β1)(1− α2

1 − β21 − 2α1β1µ

) .Conversely, if this condition holds true, that is, if E

{a2(ηt )

}< 1, by Jensen’s inequality

we have

E{log a(ηt )} = 1

2E{log a2(ηt )} ≤ 1

2logE{a2(ηt )} < 0.

Thus there exists a strictly stationary solution of the form

εt = ηtω

∞∑k=0

a(ηt−1) . . . a(ηt−k),

Page 458: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

442 GARCH MODELS

with the convention that a(ηt−1) . . . a(ηt−k) = 1 if k = 0. Using Minkowski’s inequality, itfollows that

√Eε2

t = ω

E { ∞∑k=0

a (ηt−1) . . . a (ηt−k)

}21/2

≤ ω

∞∑k=0

[E{a (ηt )

2}]k/2<∞.

3. Assume that the condition of part 1 is satisfied. We have

E (|εt | | εu, u < t) = µσt .

Letut = |εt | − E (|εt | | εu, u < t)

be the innovation of |εt |. We know that (ut ) is a noise (if Eε2t <∞). Multiplying the equation

of σt by µ and replacing µσt−j by |εt−j | − ut−j , we obtain

|εt | −r∑

i=1

(αiµ+ βi)|εt−i | = ωµ+ ut −p∑

j=1

βjut−j .

This is an ARMA(r, p) equation, which can be used to estimate the coefficients by least squares.

Exercise 2:Model (i) displays some asymmetry: positive and negative past values of the noise do not havethe same impact on the volatility. Its other properties are close to the standard GARCH model:in particular, it should allow for volatility clustering. Positivity constraints are not necessary onω and the αi but are required for the βj . The volatility is no longer a linear function of the pastsquared values, which makes its study more delicate.

Model (ii) has a constant volatility on intervals. The impact of the past values depends onlyon the interval of the partition to which they belong. If the αi are well chosen, the largest valueswill have a stronger impact than the smallest values, and the impact could be asymmetric. If I islarge, then the model is more flexible, but numerous coefficients have to be estimated (and thus itis necessary to have enough observations within each interval).

Model (iii) is a stochastic volatility model. The process σt cannot be interpreted as a volatilitybecause it is not positive in general. Moreover, |σt | is not exactly the volatility of εt in the sensethat σ 2

t is not the conditional variance of εt . At least when the distributions of the two noisesare symmetric, the model should be symmetric and the trajectories should resemble those of aGARCH.

For model (ii), neglecting the initial value, up to a constant the log-likelihood has the form ofthat of the standard GARCH models:

Ln(θ) = −1

2

n∑t=1

{log σ 2

t (θ)+ε2t

σ 2t (θ)

}.

Thus the first-order conditions are

0 = ∂Ln(θ)

∂θ= −1

2

n∑t=1

∂σ 2t (θ)

∂θ

1

σ 2t (θ)

(1− ε2

t

σ 2t (θ)

).

Page 459: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 443

We have σ 2t (θ) = α2

i if εt−1 ∈ Ai . Let Ti = {t; εt−1 ∈ Ai} and let |Ti | be the cardinality of thisset. Thus

0 = ∂Ln(θ)

∂αi= −1

2

∑t∈Ti

2αi1

α2i

(1− ε2

t

α2i

)

and finally, the maximum likelihood estimator of αi is

αi = 1

|Ti |∑t∈Ti

ε2t .

Exercise 3:

1. We immediately have E(εt ) = 0 by the independence between the ηit and the past. Moreover,

E(ε2t ) = E(σ 2

1t + σ 22t ) = ω1 + ω2 +

q∑i=1

(α1i + α2i)E(ε2t−i )+

p∑j=1

β1jE(σ2l,t−j )+ β2jE(σ

22,t−j ).

Thus if β1j = β2j := βj for all j , we obtain

E(ε2t ) =

ω1 + ω2

1−∑q

i=1(α1i + α2i )−∑p

j=1 βj.

2. Let µi = E(η4it ). We have

E(ε2t | εu, u < t) = σ 2

1t + σ 22t ,

E(ε4t | εu, u < t) = µ1σ

41t + µ2σ

42t .

In general, there exists no constant k such that{E(ε2

t | εu, u < t)}2 = kE(ε4

t | εu, u < t),

which shows that (εt ) is not a strong GARCH process.

3. Let ut = ε2t − σ 2

1t − σ 22t be the innovation of ε2

t . Replacing σ 21,t−j + σ 2

2,t−j by ε2t−j − ut−j , we

have

ε2t = ω1 + ω2 +

q∑i=1

(α1i + α2i )ε2t−i + ut +

p∑j=1

βj(ε2t−j − ut−j

),

which shows that (ε2t ) satisfies an ARMA{max(p, q), p} equation.

Page 460: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

444 GARCH MODELS

Problem 2

The three parts of this problem are independent (except part III.3).Consider the model

εt = αtεt−1 + ωt,

where (αt ) and (ωt ) are two sequences of iid random variables with symmetric distributions suchthat E(αt ) = E(ωt) = 0 and

Var(αt ) = α > 0, Var(ωt ) = ω> 0,

and admitting moments of order 4 at least. Assume in addition that αt and ωt are independent ofthe past of εt (that is, independent of {εt−s , s > 0}).

Part I: Assume that the sequences (αt ) and (ωt ) are independent.

I.1. Verify that, in the sense of Definition 2.1, there exists an ARCH(1) solution (εt ), with E(ε2t ) <

∞. Show that, in general, (εt ) is not an ARCH process in the strong sense (Definition 2.2).

I.2. For k > 0, write εt as a function of εt−k and of variables from the sequences (αt ) and (ωt ).Show that

E log |αt | < 0

is a sufficient condition for the existence of a strictly stationary solution. Express this conditionas a function of α in the case where αt is normally distributed. (Note that if U ∼ N(0, 1),then E log |U | = −0.63.)

I.3. If there exists a second-order stationary solution, determine its mean, E(εt ), and its autoco-variance function Cov(εt , εt−h), ∀h ≥ 0.

I.4. Establish a necessary and sufficient condition for the existence of a second-order stationaritysolution.

I.5. Compute the conditional kurtosis of εt . Is it different from that of a standard strong ARCH?

I.6. Is the model stable by time aggregation?

Part II: Now assume that αt = λωt , where λ is a constant.

II.1. Do ARCH solutions exist?

II.2. What is the standard property of the financial series for which this model seems more appro-priate than that of part I? For that property, what should be the sign of λ?

II.3. Assume there exists a strictly stationary solution (εt ) such that E(ε4t ) <∞.

(a) Justify the equality Cov(εt , ε2t−h) = 0, for all h> 0.

(b) Compute the autocovariance function of (ε2t ).

(c) Prove that (εt ) admits a weak GARCH representation.

Part III: Assume that the variables αt and ωt are normally distributed and that they are correlated,with Corr(αt , ωt ) = ρ ∈ [−1, 1]. Denote the observations by (ε1, . . . , εn) and let the parameterθ = (α, ω, ρ).

Page 461: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 445

III.1. Write the log-likelihood Ln(θ) conditional on ε0.

III.2. Solve the normal equations∂

∂θLn(θ) = 0.

How can these equations be interpreted?

III.3. The model is first estimated under the constraint ρ = 0, and then without constraint, on aseries (rt ) of log-returns (rt = log(pt /pt−1)). The series contains 2000 daily observations.The estimators are assumed to be asymptotically Gaussian. The following results are obtained(the estimated standard deviations are given in parentheses):

α ω ρ Ln(θ)

Constrained model 0.54 0.001 0 −1275.2(0.02) (0.0002)

Unconstrained model 0.48 0.001 −0.95 −1268.2(0.05) (0.0003) (0.04)

Comment on these results. Can we accept the model of part I? Today, the price falls by1% compared to yesterday. What are the predicted values for the returns of the next twodays? How can we obtain prediction intervals?

Page 462: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

446 GARCH MODELS

Solution

Part I:I.1. The existence of E(ε2

t ) allows us to compute the first two conditional moments of εt :

E(εt |εu, u < t) = E(αt )εt−1 + E(ωt) = 0,

E(ε2t |εu, u < t) = E(ω2

t )+ E(α2t )ε

2t−1 = ω + αε2

t−1 := σ 2t ,

in view of the independence between αt and ωt on one hand, and between these two variablesand εu, u < t , on the other hand. These conditional expectations characterize an ARCH(1)process. This process is not an ARCH in the strong sense because the variables ε2

t /σ2t =

ε2t /(ω + αε2

t−1) are not independent (see I.5).

I.2. We have, for k > 0,

εt = αt . . . αt−k+1εt−k + ωt +k−1∑n=1

αt . . . αt−n+1ωt−n,

where the sum is equal to 0 if k = 1. Let us show that the series

zt = ωt +∞∑n=1

αt . . . αt−n+1ωt−n

is almost surely well defined. By the Cauchy root test, the series is almost surely absolutelyconvergent because

exp

{1

n

(n−1∑ =0

log |αt− | + log |ωt−n|)}

,

converges a.s. to exp (E log |αt |) < 1, by the strong law of large numbers. Moreover, we havezt = αtzt−1 + ωt . Thus (zt ) is a strictly stationary solution of the model. If αt = √αUt , whereUt ∼ N(0, 1), the condition is given by α < exp(1.26) = 3.528.

I.3. We have E(εt ) = 0, Cov(εt , εt−h) = 0, for all h> 0, and Var(εt ) = ω/(1− α).

I.4. In view of the previous question, a necessary condition is α < 1. To show that this conditionis sufficient, note that, by Jensen’s inequality, it implies the strict stationarity of the solution(zt ). Moreover,

E(z2t ) = E(ω2

t )+∞∑n=1

E(α2t ) . . . E(α

2t−n+1)E(ω

2t−n) = ω +

∞∑n=1

αn−1ω <∞.

I.5. Assuming E(ε4t ) <∞ and using the symmetry of the distributions of αt and ωt , we have

E(ε4t |εu, u < t) = E(ω4

t )+ 6ωαε2t−1 + E(α4

t )ε4t−1.

The conditional kurtosis of εt is equal to the ratio E(ε4t |εu, u < t)/σ 2

t . If this coefficient wereconstant we would have, for a constant K and for all t ,

E(ω4t )+ 6ωαε2

t−1 + E(α4t )ε

4t−1 = K(ω + αε2

t−1)2.

It is easy to see that this equation has no solution.

Page 463: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 447

I.6. The model satisfied by the process (ε∗t ) := (ε2t ) satisfies

ε∗t = α2t α2t−1ε2(t−1) + ω2t + α2tω2t−1 := α∗t ε∗t−1 + ω∗t .

The independence assumptions of the initial model are not all satisfied because α∗t and ω∗t arenot independent. The time aggregation thus holds only if a dependence between the variablesωt and αt is allowed in the model.

Part II:II.1. We have

E(ε2t |εu, u < t) = ω(λεt−1 + 1)2,

which is incompatible with the conditional variance of an ARCH process.

II.2. The model is asymmetric because the volatility depends on the sign of εt−1. We should haveλ < 0, so that the volatility increases more when the stock price falls than when it rises bythe same magnitude.

II.3. .(a) The equality follows from the independence between ωt and all the past variables.

(b) For h> 1,

Cov(ε2t , ε

2t−h) = Cov

{ω2t (λεt−1 + 1

}2, ε2

t−h)

= ωCov(λ2ε2

t−1 + 2λεt−1 + 1, ε2t−h)

= ωλ2Cov(ε2t−1, ε

2t−h),

using (a) (Cov(εt−1, ε

2t−h) = 0 for h> 1). For h = 1, we obtain

Cov(ε2t , ε

2t−1

) = ωCov(λ2ε2

t−1 + 2λεt−1 + 1, ε2t−1

)= ωλ2Var

(ε2t−1

)+ 2ωλE(ε3t−1

).

We have E(ε3t−1

) = E(ω3t−1

)E (λεt−2 + 1)3 = 0, because the distribution of ωt−1 is

symmetric. Finally, the relation

Cov(ε2t , ε

2t−h) = ωλ2Cov

(ε2t−1, ε

2t−h)

is true for h> 0. For h = 0 we have

E(ε2t ) =

ω

1− λ2ω, E(ε4

t ) =E(ω4

t )(6λ2E(ε2

t )+ 1)

1− λ4E(ω4t )

,

which allows us to obtain Var(ε2t ) and finally the whole autocovariance function of (ε2

t ).

(c) The recursive relation between the autocovariances of (ε2t ) implies that the process is an

AR(1) of the formε2t = ω + ωλ2ε2

t−1 + ut

where (ut ) is a noise. The process (εt ) is thus an ARCH(1) in the weak sense.

Page 464: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

448 GARCH MODELS

Part III:III.1. We have

Ln(θ) = −1

2

(n log 2π +

n∑t=1

log σ 2t +

n∑t=1

ε2t

σ 2t

),

whereσ 2t = ω + αε2

t−1 + 2ρ√αωεt−1.

III.2. The normal equations aren∑t=1

(1− ε2

t

σ 2t

)1

σ 2t

∂σ 2t

∂θ= 0,

with∂σ 2

t

∂θ=(ε2t−1 + ρεt−1

√ω/α, 1+ ρεt−1

√α/ω, 2

√αωεt−1

)′.

These equations can be interpreted, for n large, as orthogonality relations.

III.3. Note that the estimated coefficients are significant at the 5% level. The constrained model(and thus that of part I) is rejected by the likelihood ratio test. The sign of ρ is whatwas expected. The optimal prediction is 0, regardless of the horizon. We have E(ε2

t |εu, u <t) = σ 2

t and E(ε2t+1|εu, u < t) = ασ 2

t + ω. The estimated volatility for the next day is σ 2t =

0.48(log 0.99)2 + 0.001− 2× 0.95×√0.48× 0.001× log 0.99 = 0.0015. The 95% confi-dence intervals are thus:

at horizon 1, [−1.96σt ; 1.96σt ] = [−0.075; 0.075];at horizon 2, [−1.96

√ασ 2

t + ω; 1.96√ασ 2

t + ω] = [−0.081; 0.081].

Page 465: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 449

Problem 3

Let (ηt ) be a sequence of iid random variables such that E(ηt ) = 0 and Var(ηt ) = 1. Consider, forall t ∈ Z, the model {

εt = σtηtσ rt = ω +∑q

i=1 αi |εt−i |r +∑p

j=1 βjσrt−j ,

(D.1)

where r is a positive real. The other constants satisfy:

ω> 0, αi ≥ 0, i = 1, . . . , q, βj ≥ 0, j = 1, . . . , p. (D.2)

1. Assume in this question that p = q = 1.

(a) Establish a sufficient condition for strict stationarity and give a strictly stationary solutionof the model.

(b) Give this condition in the case where β1 = 0 and compare the conditions corresponding tothe different values of r .

(c) Establish a necessary and sufficient condition for the existence of a nonanticipative strictlystationary solution such that E|εt |2r <∞. Compute this expectation.

(d) Assume in this question that r < 0. What might be the problem with that specification?

2. Propose an extension of the model that could take into account the typical asymmetric propertyof the financial series.

3. Show that from a solution (εt ) of (D.1), one can define a solution (ε∗t ) of a standard GARCHmodel (with r = 2).

4. Show that if E|εt |2r <∞, then (|εt |r ) is an ARMA process whose orders will be given.

5. Assume that we observe a series (εt ) of log-returns (εt = log(pt /pt−1)). For different powersof |εt |, ARMA models are estimated by least squares. The orders of these ARMA models areidentified by information criteria. We obtain the following results, vt denoting a noise.

r Estimated model0.5 |εt |0.5 = 0.0002+ 0.4|εt−1|0.5 + vt − 0.5vt−1 − 0.2vt−2

1 |εt | = 0.00024+ 0.5|εt−1| + 0.2|εt−2| + vt − 0.3vt−1

1.5 |εt |1.5 = 0.00024− 0.5|εt−1|1.5 + vt

2 ε2t = 0.00049+ 0.1ε2

t−1 + vt − 0.3vt−1

What values of r are compatible with model (D.1)–(D.2)? Assuming that the distributionof ηt is known, what are the corresponding parameters αi and βj?

6. In view of the previous questions discuss the interest of the models defined by (D.1).

Page 466: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

450 GARCH MODELS

Solution

1. .(a) We have

σ rt = ω + α1|εt−1|r + β1σrt−1 = ω + (α1|ηt−1|r + β1)σ

rt−1 := ω + a(ηt−1)σ

rt−1.

Thus, for N > 0,

σ rt = ω

{1+

N∑n=1

a(ηt−1) . . . a(ηt−n)

}+ a(ηt−1) . . . a(ηt−N−1)σ

rt−N−1.

Proceeding as for the standard GARCH(1, 1) model, using the Cauchy root test, it is shownthat if

E log{a(ηt )} < 0

the process (ht ), defined by

ht = limN→∞

a.s.

{1+

N∑n=1

a(ηt−1) . . . a(ηt−n)

}ω,

exists, takes real positive values, is strictly stationary and satisfies ht = ω + a(ηt−1)ht .

A strictly stationary solution of the model is obtained by εt = h1/rt ηt . This solution is

nonanticipative, because εt is function of ηt and of its past.

(b) If β1 = 0, the condition is given by

α1 < e−rE log |ηt |.

Using the Jensen inequality, it can be seen that

E log |ηt | = E

{1

2log |ηt |2

}≤ 1

2logE{|ηt |2} = 0.

The conditions are thus less restrictive when r is larger.

(c) If (εt ) is nonanticipative and stationary and admits a moment of order 2r , we have E(ε2rt ) =

E(σ 2rt )E(η2r

t ) and, since ηt−1 and σt−1 are independent,

E(σ 2rt ) = E

{ω + a(ηt−1)σ

rt−1

}2

= ω2 + 2ωE {a(ηt−1)}E(σ rt−1

)+ E{a(ηt−1)

2}E (σ 2rt−1

).

Thus, by stationarity,[1− E

{a(ηt )

2}]E (σ 2rt

) = ω2 + 2ωE {a(ηt )}E(σrt)> 0.

A necessary condition is thus E{a(ηt )

2}< 1. Conversely, if this condition is satisfied we

have

E(h2t ) ≤ 2ω2

∞∑n=1

[E{a(ηt )

2}]n <∞.

The stationary solution that was previously given thus admits a moment of order 2r . Usingthe previous calculation, we obtain

E(ε2rt

) = E(η2rt

) ( ω2

1− E{a(ηt )2

} + 2ω2E {a(ηt )}[1− E{a(ηt)2}] [1− E {a(ηt )}]

).

Page 467: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 451

(d) If r < 0, a(ηt ) is not defined when ηt = 0. This is the same for the strictly stationarysolution, when one of the variables ηt−j is null. However, the probability of such an eventis null if P(ηt = 0) = 0.

2. The specification

σ rt = ω +q∑i=1

(αi,+ 1lεt−i > 0+αi,− 1lεt−i<0

) |εt−i |r + p∑j=1

βjσrt−j ,

with αi,+> 0 and αi,−> 0, induces a different impact on the volatility at time t for the positiveand negative past values of εt . For αi,+ = αi,− we retrieve model (1).

3. Let ε∗t = |εt |r/2νt , where (νt ) is an iid process, independent of (εt ) and taking the values −1and 1 with probability 1/2. We have ε∗t = σ ∗t η∗t with

σ ∗2t = σ rt = ω +

q∑i=1

αi |εt−i |∗2 +p∑j=1

βjσ∗2t−j ,

and (η∗t ) = (|ηt |r/2νt ) is an iid process with mean 0 and variance 1. The process (ε∗t ) is thus astandard GARCH.

4. The innovation of |εt |r is defined by

ut = |εt |r −E(|εt |r | εt−1, εt−2, . . .

) = |εt |r − σ rt µr

with E|ηt |r = µr . It follows that

|εt |r = ωµr +max(p,q)∑

i=1

(αiµr + βi)|εt−i |r + ut −p∑j=1

βjut−j

and (|εt |r ) is an ARMA{max(p, q), p} process.

5. Noting that the order and coefficients of the AR part in the ARMA representation are greater (inabsolute value) than those of the MA part, we see that only the model for r = 1 is compatiblewith the class of model that is considered here. We have r = p = 1 and q = 2, and the estimatedcoefficients are

β1 = 0.3, α1 = 0.5− β1

µ1= 0.2

µ1, α2 = 0.2.

6. The proposed class can take into account the same empirical properties as the standard GARCHmodels, but is more flexible because of the extra parameter r .

Page 468: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

452 GARCH MODELS

Problem 4

Let (ηt ) be a sequence of iid random variables such that E(ηt ) = 0, Var(ηt ) = 1, E(η4t ) = µ4. Let

(at ) be another sequence of iid random variables, independent of the sequence (ηt ), taking thevalues 0 and 1 and such that

P(at = 1) = p, P (at = 0) = 1− p, 0 ≤ p ≤ 1.

Consider, for all t ∈ Z, the model

εt = {σ1t at + σ2t (1− at )}ηt , (D.3)

σ 21t = ω1 + α1ε

2t−1, σ 2

2t = ω2 + α2ε2t−1, (D.4)

whereωi > 0, αi ≥ 0, i = 1, 2.

A solution such that εt is independent of the future variables ηt+h and at+h, h> 0, is callednonanticipative.

1. What are the values of the parameters corresponding to the standard ARCH(1)? What kind oftrajectories can be obtained with that specification?

2. In order to obtain a strict stationarity condition, write (D.4) in the form

Zt :=(

σ 21tσ 2

2t

)=(

ω1

ω2

)+ At−1

(σ 2

1,t−1σ 2

2,t−1

)(D.5)

where At−1 is a matrix depending on ηt−1, at−1, α1, α2.

3. Deduce a strict stationarity condition for the process (Zt ), and then for the process (εt ), asfunction of the Lyapunov coefficient

γ = limt→∞ a.s.

1

tlog ‖AtAt−1 . . . A1‖

(justify the existence of γ and briefly outline the steps of the proof). Note that At is the productof a column vector by a row vector. Deduce the following simple expression for the strictstationarity condition:

αp

1 α1−p2 < c,

for a constant c that will be specified. How can the condition be interpreted?

4. Give a necessary condition for the existence of a second-order and nonanticipative station-ary solution. It can be assumed that this condition implies strict stationarity. Deduce that thenecessary second-order stationarity condition is also sufficient. Compute the variance of εt .

5. We now consider predictions of future values of εt and of its square. Give an expression forE(εt+h|εt−1, εt−2, . . .) and E(ε2

t+h|εt−1, εt−2, . . .), for h> 0, as a function of εt−1.

6. What is the conditional kurtosis of εt? Is there a standard ARCH solution to the model?

7. Assuming that the distribution of ηt is standard normal, write down the likelihood of the model.For a given series of 2000 observations, a standard ARCH(1) model is estimated, and thenmodel (D.3)–(D.4) is estimated. The estimators are assumed to be asymptotically Gaussian.

Page 469: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 453

The results are presented in the following table (the estimated standard deviations are given inparentheses, Ln(·) denotes the likelihood):

ω1 α1 ω2 α2 p logLn(θ )

ARCH(1) 0.002 0.6 – – – −1275.2(0.001) (0.2)

Model (D.3)–(D.4) 0.001 0.10 0.005 1.02 0.72 −1268.2(0.001) (0.03) (0.000) (0.23) (0.12)

Comment on these results. Can we accept the general model? (we have P{χ2(3)> 7.81

} =0.05).

8. Discuss the estimation of the model by OLS.

Page 470: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

454 GARCH MODELS

Solution

1. The standard ARCH(1) is obtained for α1 = α2, ∀p. It is also obtained for p = 0, ∀α1, α2 andfor p = 1, ∀α1, α2. The trajectories may display abrupt changes of volatility (for instance, if ω1

and ω2 are very different).

2. We obtain equation (D.5) with

At−1 =(

α1η2t−1a

2t−1 α1η

2t−1(1− a2

t−1)

α2η2t−1a

2t−1 α2η

2t−1(1− a2

t−1)

).

3. The existence of γ requires E log+ ‖At‖ <∞. This condition is satisfied because E‖At‖ <∞,

for example with the norm defined by ‖A‖ =∑ |aij |. The strict stationarity condition γ < 0is shown as for the standard GARCH. Under this condition the strictly stationary solution of(D.5) is given by

Zt =(I +

∞∑i=0

AtAt−1 . . . At−i

)(ω1

ω2

).

Note that

At =(

α1η2t

α2η2t

) (a2t 1− a2

t

).

Thus

AtAt−1 = η2t η

2t−1

{α1a

2t + α2(1− a2

t )}( α1η

2t

α2η2t

) (a2t−1 1− a2

t−1

),

AtAt−1 . . . A1 = η21

t−2∏i=0

η2t−i{α1a

2t−i + α2(1− a2

t−i )}( α1η

2t

α2η2t

) (a2

0 1− a20

),

‖AtAt−1 . . . A1‖ = η21

t−2∏i=0

η2t−i{α1a

2t−i + α2(1− a2

t−i )} ∥∥∥∥( α1η

2t

α2η2t

)(a2

0 1− a20)

∥∥∥∥ ,because α1 and α2 are positive. It follows that, by the strong law of large numbers,

1

tlog ‖AtAt−1 . . . A1‖ = 1

t

t−1∑i=0

log η2t−i +

1

t

t−2∑i=0

log{α1a

2t−i + α2(1− a2

t−i )}

+1

tlog

∥∥∥∥( α1η2t

α2η2t

) (a2

0 1− a20

)∥∥∥∥→ E log η2

t + E log{α1a

2t + α2(1− a2

t )}

almost surely as t →∞. The second expectation is equal to

p logα1 + (1− p) logα2 = logαp1 α1−p2 .

It follows thatγ < 0 ⇐⇒ α

p

1 α1−p2 < exp

{−E(log η2t )}.

We note that the condition is satisfied even if one of the coefficients, for instance α1, is large,provided that the corresponding probability, p, is not too large.

Page 471: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 455

4. If (εt ) is second-order stationary, we have

Eε2t = p

{ω1 + α1E(ε

2t )}+ (1− p)

{ω2 + α2E(ε

2t )}.

The necessary condition is thus

α := pα1 + (1− p)α2 < 1,

and we have

Var(εt ) = pω1 + (1− p)ω2

1− pα1 − (1− p)α2.

Conversely, suppose that this condition is satisfied and that it implies that γ < 0. We have

E(At) =(

α1p α1(1− p)

α2p α2(1− p)

).

This matrix has rank 1, and thus admits a zero eigenvalue and a nonzero eigenvalue that isequal to its trace, α. This coefficient, less than 1 by assumption, is also the spectral radiusof E(At). It follows that the expectation of the stationary solution Zt defined above is finitebecause EAtAt−1 . . . At−i = {E(At)}i+1.

5. We clearly have E(εt+h | εt−1, εt−2, . . .) = 0. Since at (1− at ) = 0 we have

ε2t+h =

[{ω1a

2t+h + ω2(1− a2

t+h)}+ {α1a

2t+h + α2(1− a2

t+h)}ε2t+h−1

]η2t+h

:= ωt+h + αt+hε2t+h−1

= ωt+h + αt+hωt+h−1 + · · · + αt+h . . . αt+1ωt + αt+h . . . αt ε2t−1.

Letting ω = Eωt and since α = Eαt , it follows that

E(ε2t+h | εt−1, . . .

) = ω(1+ α + · · · + αh

)+ αh+1ε2t−1

for h> 0.

6. The conditional kurtosis is equal to

E(ε4t | εt−1, . . .){

E(ε2t | εt−1, . . .)

}2 =E(η4

t ){E(η2

t )}2

pσ 41t + (1− p)σ 4

2t{pσ 2

1t + (1− p)σ 22t

}2 .

This coefficient depends on t in general, which shows that there is no standard GARCH solutionto this model (except in the cases mentioned in part 1).

7. The conditional density of εt is written as

lt = p1√

2πσ1texp

{− ε2

t

2σ 21t

}+ (1− p)

1√2πσ2t

exp

{− ε2

t

2σ 22t

}

and the log-likelihood of the sample is the product of the lt for t = 1, . . . , n. The estimationresults display very different estimated coefficients α1 and α2. Moreover, the likelihood ratio testis equivalent to comparing the difference of the log-likelihoods and the quantile of order 1− α

of a χ 2(3). Since we have 2× (1275.2− 1268.2)> 7.81, the standard ARCH(1) is rejected infavor of the general model at the 5% level.

Page 472: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

456 GARCH MODELS

Problem 5

Let (ηt ) be a sequence of iid random variables, such that E(ηt ) = 0. When E|ηt |m <∞, denoteµm = Eηmt . Consider the model

εt = ηt + bηtεt−1, t ∈ Z. (D.6)

1. Strict stationarity

(a) Let

Zt,n = ηt +n∑i=1

biηtηt−1 · · · ηt−i .

Show that if E log |bηt | < 0 then the sequence (|Zt,n|)n≥1 converges almost surely. Underthis condition, let

Zt = ηt +∞∑i=1

biηtηt−1 · · · ηt−i .

(b) Show that if E log |bηt | < 0 then equation (D.6) admits a nonanticipative and ergodic strictlystationary solution.

(c) We have ∫ ∞

−∞log |x| 1√

2πexp

{−x

2

2

}dx = −0.635181.

Give the strict stationarity condition when ηt ∼ N(0, µ2).

2. Second-order stationarity

(a) Under what condition is (Zt,n)n a Cauchy sequence in L2?

(b) Show that b2µ2 < 1 entails E log |bηt | < 0.

(c) Show that if b2µ2 < 1 then (εt ) = (Zt ) is the second-order stationary solution of (D.6).

(d) Assume that µ2 = 0. Show that the condition b2µ2 < 1 is also necessary for the existenceof a nonanticipative second-order stationary solution.

3. Properties of the marginal distribution and conditional moments Assume that b2µ2 < 1and that (εt ) is the second-order stationary solution of (D.6).

(a) Show that (εt ) is a weak GARCH process whose orders will be specified.

(b) Compare the conditional variance of (εt ) with that of a strong ARCH(1). Does the signof εt−1 have an impact on the volatility at time t? Is this property of interest for financialseries?

4. Estimation Denote by b0 and µ02 the true value of the parameters b and µ2. Assume thatb2

0µ02 < 1 and that ε1, . . . , εn is a second-order stationary realization of model (D.6). Let

ht = ht (b, µ2) = µ2(1+ bεt−1)2 and h0t = ht(b0, µ02).

(a) What is the interpretation of νt = ε2t − h0t = (η2

t − µ02)(1+ b0εt−1)2?

Page 473: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 457

(b) Assume that Eε4t <∞. Show that, almost surely,

limn→∞

2

n

n∑t=2

νt (h0t − ht ) = 0.

(c) Assume that the distribution of εt is not concentrated on one or two points (in particular,µ2 = 0). Show that

E(h0t − ht )2 = 0 if and only if b = b0 and µ2 = µ02.

(d) Under the previous assumptions, consider the criterion

Qn(b, µ2) = 1

n

n∑t=2

{ε2t − ht

}2.

Show that, almost surely,

limn→∞Qn(b, µ2) ≥ lim

n→∞Qn(b0, µ02)

with equality if and only if b = b0 and µ2 = µ02. Give an estimation method for theparameters.

(e) Describe the quasi-maximum likelihood method.

5. Extension Without giving detailed proofs, extend the stationarity and estimation results to themodel

εt = ηt + b1ηtεt−1 + · · · + bqηtεt−q , t ∈ Z.

6. Further extensions Assume that µ2 = 0, µ3 = 0 and µ4b4 < 1.

(a) Compute the autocovariance function of ε2t . Prove that ε2

t follows a weak ARMA modelwhose orders will be given.

(b) Propose a moment estimator for µ2 and b2. Show that this estimator is consistent.

(c) When they exist, compute the matrices

I = limn→∞Var

{√n

(∂Qn(b0,µ02)

∂b2∂Qn(b0,µ02)

∂µ2

)}

and

J = limn→∞

∂2Qn(b0,µ02)

∂b22

∂2Qn(b0,µ02)∂b2∂µ2

∂2Qn(b0,µ02)∂µ2∂b2

∂2Qn(b0,µ02)

∂µ22

.

What moment condition is necessary for the existence of I and J ?

(d) Give the scheme of proof which would establish that, under some assumptions, the least-squares estimator θ of the parameter θ0 = (b0, µ02)

′ satisfies

√n(θ − θ0)

L→ N (0, (θ0)) ,

where (θ0) = J−1IJ−1.

Page 474: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

458 GARCH MODELS

(e) Let

t (θ) = loght + ε2t

ht, ht = µ2(1+ bεt−1)

2.

When it exists, compute

JQML = E

(∂2 t (θ0)

∂θ∂θ ′

)and show that

JQML = E

(1

h2t (θ0)

∂ht (θ0)

∂θ

∂ht (θ0)

∂θ ′

)= (κη − 1

)Var

∂ t (θ0)

∂θ,

whereκη = µ04

µ202

.

What moment condition is necessary for the existence of JQML?

(f) Give the scheme of proof which would establish that, under some assumptions, the quasi-maximum likelihood estimator θQML satisfies

√n(θQML − θ0)

L→ N (0,(θ0)QML

).

(g) What is the particular form of QML(θ0) at θ0 = (0, µ02)′? What are the consequences on

the asymptotic properties of the estimator?

(h) Compare (θ0) and QML(θ0) at θ0 = (0, µ02)′.

(i) Without giving detailed proofs, extend the stationarity and QML estimation results to themodel

εt = ηt + b1ηtεt−1 + · · · + bqηt εt−q , t ∈ Z.

Page 475: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 459

Solution

1. .(a) In view of the Cauchy root test, it suffices to show that almost surely

limi→∞

|biηt · · · ηt−i |1/i < 1.

By the law of large numbers, this limit is equal to

limi→∞

exp

{1

i

(i∑

k=1

log |bηt−k| + log |ηt |)}

= exp{E log |bηt |},

which shows the result.

(b) For all n, we haveZt,n = ηt + bηtZt−1,n−1.

Taking the limit, Zt = ηt + bηtZt−1, which shows that (εt ) = (Zt ) is a nonanticipativesolution of (D.6). Since Zt = f (ηt , ηt−1, . . . ) (where f : R∞ → R is measurable) and (ηt )is ergodic and stationary, (Zt ) is also stationary and ergodic.

(c) We have E log∣∣ηt/√µ2

∣∣ = −0.635181. The stationarity condition is thus written as

E log

∣∣∣∣b√µ2ηt√µ2

∣∣∣∣ = log |b√µ2| + E log

∣∣∣∣ ηt√µ2

∣∣∣∣ < 0,

or equivalently|b|õ2 < exp {0.635181} = 1.88736.

2. .(a) For n < m, we have

E

{m∑i=n

biηt · · · ηt−i}2

=m∑i=n

b2iµi+12 → 0

as n,m→∞ (that is, the sequence is a Cauchy sequence) if and only if

b2µ2 < 1.

(b) If b2µ2 < 1 then, using the Jensen inequality, we have

E log |bηt | = 1

2E log b2η2

t ≤1

2logEb2η2

t =1

2log b2µ2 < 0.

(c) When b2µ2 < 1, we have seen that (Zt,n)n is a Cauchy sequence. Thus it converges in L2

to some limit Zt . It also converges almost surely to Zt . Thus Zt = Zt almost surely, andEZ2

t <∞. To show the uniqueness of the solution, assume the existence of two second-order stationary solutions (Zt ) and (Z∗t ). Then, for any n ≥ 1,

Zt − Z∗t = bηt (Zt−1 − Z∗t−1) = bnηtηt−1 · · · ηt−n+1(Zt−n − Z∗t−n).

By the Cauchy–Schwarz and triangular inequalities,

E|Zt − Z∗t | ≤ |b|nµn/22

{‖Z1‖2 + ‖Z∗1‖2}.

Since this is true for all n, the condition b2µ2 < 1 entails E|Zt − Z∗t | = 0, which impliesZt = Z∗t almost surely.

Page 476: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

460 GARCH MODELS

(d) If εt is a second-order stationary solution then

Eε2t = µ2 + b2µ2Eε

2t ,

that is, (1− b2µ2

)Eε2

t = µ2.

If b2µ2 were greater than 1, the left-hand side of the previous inequality would be negative,which is impossible because the right-hand side is strictly positive.

3. .(a) Such a solution is nonanticipative and satisfies

Eεt = Eηt + bEηtEεt−1 = 0,

Eε2t =

µ2

1− b2µ2,

Cov (εt , εt−h) = 0, ∀h> 0.

It is thus a white noise. Let us show that (ε2t ) is an ARMA process. Using the independence

between ηt and εt−k and E(η2t ) = µ2, we have, for k > 0,

Cov(ε2t , ε

2t−k) = Cov(η2

t + 2bη2t εt−1 + b2η2

t ε2t−1, ε

2t−k)

= 2bCov(η2t εt−1, ε

2t−k)+ b2Cov(η2

t ε2t−1, ε

2t−k)

= 2bµ2Cov(εt−1, ε2t−k)+ b2µ2Cov(ε2

t−1, ε2t−k).

For k > 1,

Cov(εt−1, ε2t−k) = E(εt−1ε

2t−k) = E(ηt−1(1+ bεt−2)ε

2t−k) = 0.

It follows that, for k > 1,

Cov(ε2t , ε

2t−k) = b2µ2Cov(ε2

t−1, ε2t−k),

which shows that (ε2t ) admits an ARMA(1, 1) representation. Finally, (εt ) admits a weak

GARCH(1, 1) representation.

(b) The volatility of the model is

µ2(1+ bεt−1)2 = µ2 + b2µ2ε

2t−1 + 2bµ2εt−1,

whereas it is of the formω + αε2

t−1

for an ARCH(1). The sign of εt−1 is thus important. If b < 0, a negative return εt−1 increasesthe volatility more than it does the return −εt−1 > 0. Such an asymmetry in the shocks isobserved in real series, but is not taken into account by standard GARCH models.

4. .(a) Since h0t is the conditional expectation of ε2t , νt is the strong innovation of ε2

t .

(b) The process {νt (h0t − ht )}t is ergodic and stationary, by arguments already used. Theergodic theorem entails that

limn→∞

2

n

n∑t=2

νt (h0t − ht ) = E(η2t − µ02)Eh0t (h0t − ht ) = 0 a.s.

because h0t (h0t − ht ) is independent of (η2t − µ02), as measurable function of {ηu, u ≤

t − 1}.

Page 477: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 461

(c) We have E(h0t − ht )2 = 0 if and only if

h0t − ht = (µ02b20 − µ2b

2)ε2t−1 + 2(µ02b0 − µ2b)εt−1 + (µ02 − µ2) = 0 a.s.

This second-order equation in εt−1 (or in εt by stationarity) admits a solution if and only ifthe coefficients are null, that is, if and only if b = b0 and µ2 = µ02.

(d) Using the two last questions, almost surely,

limn→∞Qn(b, µ2) = lim

n→∞1

n

n∑t=2

{ε2t − h0t + h0t − ht

}2

= limn→∞Qn(b0, µ02)+ E(h0t − ht )

2 + 0

≥ limn→∞Qn(b0, µ02)

with equality if and only if b = b0 and µ2 = µ02. This suggests looking for a value of(b, µ2) minimizing the criterion Qn(b,µ2). This is the least-squares method.

(e) If ηt is N(0, µ02) distributed then the distribution of εt given {εu, u < t} is N(0, h0t ).Given the initial value ε1, the quasi-log likelihood of ε2, . . . , εn is thus

Ln(b, µ2) = −n2

log 2π − 1

2

n∑t=2

{loght (b, µ2)+ ε2

t

ht (b, µ2)

}.

A quasi-maximum likelihood estimator satisfies

(b, µ2) = arg max(b,µ2)∈�

Ln(b, µ2),

where � ⊂ R×]0,∞[ is the parameter space. If � is a compact set, since the criterion iscontinuous, there always exists at least one QMLE.

Page 478: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

462 GARCH MODELS

Problem 6

Consider the model {εt = σtηtσ 2t = ω(ηt−1)+ αε2

t−1 + βσ 2t−1,

(D.7)

where (ηt ) is an iid sequence of random variables with mean 0 and variance 1 and finite momentsof order 4 at least, and where ω(·) is a function with strictly positive values. Let ω = E{ω(ηt )}.1. What is the difference between this model and the standard GARCH(1, 1) model and why is

it of interest for the modeling of financial series? An example of simulated trajectory of themodel is given in Figure D.1.

2. Strict stationarity

(a) Show that under the assumptionE log a(ηt ) < 0,

where a is a function that will be specified, the model admits a unique nonanticipativestrictly stationary solution.

(b) Show that if E log a(ηt )> 0, the model does not admit a strictly stationary solution.

3. Second-order stationarity, kurtosis

(a) Establish the necessary and sufficient condition for the existence of a second-order stationarysolution, and compute E(ε2

t ). Prove that the process has the same second-order propertiesas a standard GARCH(1, 1) (that is, with ω(·) constant).

(b) Assuming that the fourth-order moments exist, compare the kurtosis coefficients of theseprocesses.

4. Asymmetries Give an example of a specification of ω that can take into account the usualasymmetry property of the financial series.

5. ARMA representations

(a) Denote by νt = ε2t −E

(ε2t | εu, u < t

)the innovation of ε2

t . Show that, under assumptionsto be specified, we have

ε2t = ω + (α + β)ε2

t−1 + ut , where ut = νt − βνt−1 + ω(ηt−1)− ω.

20 40 60 80 100

−10

−5

5

20 40 60 80 100

−10

−5

5

Figure D.1 Simulated trajectory of model (D.7) with α = 0.2, β = 0.5 and ω(ηt−1) = 4 (left)ω(ηt−1) = 1+ η4

t−1 (right), for the same sequence of variables ηt ∼ N(0, 1).

Page 479: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 463

(b) Show that (ut ) is an MA(1) process. Prove that ε2t admits an ARMA(1, 1) representation.

Is this representation different from that obtained for the standard GARCH(1, 1) such thatω(ηt−1) = ω? (The case β = 0 can be considered.)

6. Estimation and tests

(a) With the aid of part 5, note that the autocorrelation function ρ(h) of the process (ε2t ) satisfies

ρ(h) = αρ(h− 1), for h> 1. Give a simple estimator of α that does not depend on thespecification of ω. The following values were obtained for the first empirical autocorrelationsof (ε2

t ):ρ(1) = 0.445, ρ(2) = 0.219, ρ(3) = 0.110, ρ(4) = 0.056.

Give an estimate of α. Is a standard ARCH(1) model (β = 0 and ω constant) plausible forthese data?

(b) Assume that the function ω(·) is parameterized by some parameter γ : for example,ω(ηt−1) = 1+ γ η2

t−1 with γ > 0. Consider the estimation of θ = (γ, α, β)′ using theobservations ε1, . . . , εn. Write down the quasi-maximum likelihood criterion, given initialvalues for εu, u < 1.

7. Extension Outline how the previous results are modified if ω(ηt−1) is replaced by ω(ηt−k) in(D.7), with k > 1.

Page 480: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

464 GARCH MODELS

Problem 7

Consider the ARCH models

(I) εt = σtηt , σ2t = ω + αε2

t−1,

(II) εt = σtηt , σ2t = ω + αε2

t−2,

where ω> 0, α ≥ 0, and (ηt ) denotes an iid sequence of random variables, such that E(ηt ) = 0and Var(ηt ) = 1.

1. Show that the strict stationarity condition is the same for the two models, and show that thiscondition implies the existence of a unique strictly stationary nonanticipative solution. For eachmodel, give the unique strictly stationary nonanticipative solution. Prove that, when it exists,the expectation Ef (εt ) of any given function f of εt is the same in the two models.

2. From the observation of empirical autocovariances of ε2t , how can we determine the data

generating process between model (I) and model (II)?

3. For model (II), write down the likelihood and the equations that allow ω and α to be estimated(without trying to solve these equations). Recall that the asymptotic variance of the quasi-maximum likelihood estimator is (Eη4

t − 1)J−1, where

J = Eθ0

(1

σ 4t (θ0)

∂σ 2t (θ0)

∂θ

∂σ 2t (θ0)

∂θ ′

).

Compare the asymptotic variances of the estimators of θ = (ω, α)′ in the two models. In eachmodel, how is the hypothesis α = 0 tested?

Page 481: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 465

Problem 8

Exercise 1: Recall that the autocorrelation function ρ(·) of a second-order stationary ARMA(p, q)process satisfies

ρ(h) =p∑i=1

φiρ(h− i), h>q,

where the φi are the AR coefficients. Denote by B the lag operator: BXt = Xt−1. Let (εt ) be astrictly stationary solution of a GARCH(p, q) model such that E(ε4

t ) <∞.

1. Show that (ε2t ) admits an ARMA representation. What is the relation satisfied by the autocor-

relation function ρε2 of this process?

2. The aim of this question is to check that the function ρε2 has positive values.

(a) Show the property when p = q = 1.

(b) Using the ARMA representation of ε2t , show that there exist constants ci and a noise νt

such that

ε2t = c0 +

∞∑i=1

ciνt−i .

(c) Let P and Q be two polynomials with positive coefficients such that P(0) = Q(0) = 0.Assume that 1−Q(z) = 0 implies that |z|> 1. Show by induction that {1−Q(B)}−1P(B)

is a series in B with positive terms.

(d) Prove that the coefficients ci are positive.

(e) Deduce the property.

3. The aim of this question is to show that the function ρε2 is not always decreasing.

(a) Verify that for a GARCH(1, 1) model the function ρε2 is decreasing.

(b) Assume that q = 2 and p = 0, that is, εt =√ω + α1ε

2t−1 + α2ε

2t−2 ηt .

(i) What is the relationship between ρε2(h), ρε2(h− 1) and ρε2 (h− 2) for h> 0?

(ii) Deduce an expression for {ρε2(1)}−1ρε2(2) as a function of α1 and α2.

(iii) Prove that for some set of values of α1 and α2, the function ρε2 is not decreasing.

Exercise 2: Consider the ARCH(q) model{εt = σtηt ,

σ 2t = ω0 +

∑q

i=1 α0iε2t−i ,

where ω0 > 0, α0i ≥ 0, i = 1, . . . , q, and (ηt ) is an iid sequence such that E(ηt ) = 0, andVar(ηt ) = 1. Let ε1, . . . , εn be n observations of the process (εt ) and let ε0, . . . , ε1−q be initialvalues. Introduce the vector Zt−1 ∈ Rq defined by Z′t−1 =

(1, ε2

t−1, . . . , ε2t−q), and the n× q

matrix X and the n× 1 vector Y given by

X =

1 ε20 . . . ε2

−q+1...

1 ε2n−1 . . . ε2

n−q

= Z′0

...

Z′n−1

, Y =

ε21...

ε2n

.

Page 482: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

466 GARCH MODELS

Denote by θ0 = (ω0, α01, . . . , α0q)′ the true value of the parameter.

1. Show that the OLS estimator θ = (ω, α1, . . . , αq)′ of θ0 is given by

θ = (X′X)−1X′Y.

We use the notation σ 2t (θ ) = ω +∑q

i=1 αiε2t−i and εt = {σt (θ )}−1εt .

2. Give conditions ensuring the following almost sure convergences as n→∞:

1

nX′X→ Eθ0 (Zt−1Z

′t−1),

1

nX′Y → Eθ0(Zt−1ε

2t ).

3. Prove that θ converges almost surely to θ0.

4. In order to take into account the conditional heteroscedasticity, define the weighted least-squaresestimator

θ = (X′X)−1X′Y ,

where

X =

σ−21 (θ) ε2

0 . . . ε2−q+1

...

σ−2n (θ) ε2

n−1 . . . ε2n−q

, Y =

ε21...

ε2n

.

(a) Without going into all the mathematical details, justify the introduction of such an estimator.

(b) Show that

√n(θ − θ0) =

(1

n

n∑t=1

Zt−1Z′t−1

{σt (θ )}4

)−1 {1√n

n∑t=1

Zt−1(η2t − 1)

{σt (θ )}2

}. (D.8)

(c) Let J = E(σ−4t Zt−1Z

′t−1). Justify the following results:

1

n

n∑t=1

σ−4t Zt−1Z

′t−1 → J, a.s.,

1√n

n∑t=1

σ−2t Zt−1(η

2t − 1)

L→ N(0, (µ4 − 1)J ).

(d) Assume that the asymptotic distribution of the right-hand side of (D.8) does not changewhen σ 2

t (θ ) is replaced by σ 2t . Deduce the asymptotic distribution of

√n(θ − θ0).

Page 483: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 467

Problem 9

For any random variable X, denote

X+ = max{X, 0}, X− = min{X, 0}.

Thus X+ ≥ 0 and X− ≤ 0 almost surely. Consider the threshold GARCH(1, 1) model, orTGARCH(1, 1), defined by{

εt = σtηtσt = ω + α+ε+t−1 − α−ε−t−1 + βσt−1,

(D.9)

where (ηt ) is a centred iid sequence with variance 1, and where ω is strictly positive, and α+, α−and β are nonnegative numbers. In model (D.9), the parameter β is called the shock persistenceparameter. In order to introduce some asymmetry for this persistence parameter, that is, a differentvalue when εt−1 < cσt−1 and when εt−1 >cσt−1 for some constant c (that is, depending on whetherthe price fell abnormally or not), consider the model{

εt = σtηtσt = ω + α+(εt−1 − cσt−1)

+ − α−(εt−1 − cσt−1)− + βσt−1.

(D.10)

1. Explain briefly the difference between the TGARCH model defined by (D.9) and the standardGARCH(1, 1) model, and why the TGARCH model might be of interest for financial seriesmodeling.

2. Rewrite (D.10) to introduce an asymmetric persistence parameter. Why might such a model beof interest?

3. Study of the TGARCH model defined by (D.9)

(a) Give expressions for ε+t and ε−t as functions of σt , η+t and η−t .

(b) Show that σt = ω + a(ηt−1)σt−1, where a is a function that will be specified.

(c) Give a sufficient condition for the existence of a nonanticipative and ergodic strictly sta-tionary solution.

(d) Specify this stationarity condition when β = 0 and ηt is N(0, 1) distributed.

(e) Give a necessary condition for the existence of a nonanticipative stationary solution suchthat Eσt <∞. Give Eσt , when it exists.

(f) Give a necessary condition for the existence of a nonanticipative stationary solution suchthat Eσ 2

t <∞. Give Eσ 2t , when it exists.

(g) Assume that ηt has a symmetric distribution. When they exist, give the almost sure limitsof

1

n

n∑t=1

ε+t ε+t−1 and

1

n

n∑t=1

ε+t ε−t−1

as n→∞. Give a simple empirical method for checking if α+ < α− (do not go into thedetails of the test).

Page 484: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

468 GARCH MODELS

4. Study of the model defined by (D.10)

(a) Give a sufficient condition for the existence of a nonanticipative and ergodic strictly sta-tionary solution.

(b) Give a necessary and sufficient condition for the existence of a nonanticipative stationarysolution such that Eσt <∞.

(c) Assume that ηt has a strictly positive density on R. Show that, except in the degeneratecase where α+ = α− = 0, the model is identifiable, that is, denoting the ‘true’ value of theparameter by σt = σt (θ), where θ = (ω, α+, α−, c, β), we have

σt (θ) = σt (θ∗) a.s. if and only if θ = θ∗.

(d) Give a method for estimating the parameter θ .

Page 485: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 469

Solution

1. In the standard GARCH formulation, the conditional variance σ 2t = ω + αε2

t−1 + βσ 2t−1 does

not depend on the sign of εt−1. In the TGARCH formulation with α−>α+, a negative returnεt−1 entails a greater increase in the volatility than a positive return of the same magnitude.Empirical studies have shown the existence of such asymmetries in most financial series.

2. We have

σt ={ω + α+εt−1 + (β − cα+)σt−1 if ηt−1 ≥ c,

ω − α−εt−1 + (β + cα−)σt−1 if ηt−1 ≤ c.

In this model the persistence coefficient is equal to (β − cα+)σt−1 when ηt−1 ≥ c (that is, whenεt−1 ≥ cσt−1), and (β − cα+)σt−1 when ηt−1 ≤ c. A motivation for considering this model isthat a negative shock should increase volatility more than a positive one of the same amplitude,and also that this increase should have a longer effect. By testing the assumption that c = 0,one can test whether the persistence of the negative shocks is the same as that of the positiveshocks.

3. .(a) The positivity of the coefficients guarantees that σt ≥ 0 (starting from an initial valueσ0 ≥ 0). We thus have ε+t = σtη

+t and ε−t = σtη

−t .

(b) In view of (a), σt = ω + a(ηt−1)σt−1, where a(η) = α+η+ − α−η− + β.

(c) For all n ≥ 1, let

st (n) = ω

{1+

n∑i=1

i∏k=1

a(ηt−k)

}.

We have st (n) = ω + a(ηt−1)st−1(n− 1) for all n. If st = limn→∞ st (n) exists, then theprevious equality still holds true at the limit, and the solution is given by σt = st (stationarityand ergodicity follow from the fact that st = f (ηt−1, ηt−2, . . . ), where f is a measurablefunction and (ηt ) is ergodic and stationary). Since all the terms involved in st (n) are positive,st is an increasing limit which exists in R+ ∪ {+∞}. By the Cauchy root criterion, the limitexists in R+ if

λ := lim supn→∞

{n∏

k=1

a(ηt−k)

}1/n

< 1.

By the law of large numbers, log λ = E log a(η1). The condition E log a(η1) < 0 thus guar-antees the existence of the solution.

(d) When β = 0 and the distribution of η1 is symmetric, we have

E log a(η1) =∫ ∞

0log(α+x)dPη(x)−

∫ 0

−∞log(α−x)dPη(x)

= 1

2log (α+α−)+ E log |η1|.

The condition is then given by α+α− < exp (−2E log |η1|).(e) If there exists a nonanticipative stationary solution such that Eσt <∞, then

Eσt = ω + Ea(ηt )Eσt = ω

1− α+Eη+t + α−Eη−t − β,

but this is possible only if Ea(ηt ) < 1, that is, α+Eη+1 − α−Eη−1 + β < 1.

Page 486: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

470 GARCH MODELS

(f) If there exists a nonanticipative stationary solution such that Eσ 2t <∞, then

Eσ 2t = ω2 + Ea2(ηt )Eσ

2t + 2ωEa(ηt )Eσt

=(

1+ 2Ea(ηt )

1− Ea(ηt )

)ω2

1− Ea2(ηt ),

which is possible only if Ea2(η1) < 1.

(g) By the ergodic theorem, and noting that ε+t ε−t = 0, under the assumption that the moments

exist we have

limn→∞

1

n

n∑t=1

ε+t ε+t−1 = Eη+t E(ω + α+ε+t−1 + βσt−1)ε

+t−1

and

limn→∞

1

n

n∑t=1

ε+t ε−t−1 = Eη+t E(ω − α−ε−t−1 + βσt−1)ε

−t−1, a.s.

Since the distribution of ηt is symmetric, we have

Eε+t = Eη+t Eσt = −Eε−t , Eε+t σt = −Eε−t σt

and E(ε+t )2 = E(ε−t )2 = Eσ 2t /2. For testing α+ < α− one can thus use the statistic

n−1∑nt=1 ε

+t εt−1, which should converge to E(η+t )(α+ − α−)Eσ 2

t /2.

4. .(a) The arguments of part 3(c) show that a condition for the existence of such a solution isE log b(η1) < 0 with

b(ηt ) = (α+ηt + β − cα+) 1lηt≥c + (−α−ηt + β + cα−) 1lηt<c

= α+(ηt − c)+ − α−(ηt − c)− + β.

(b) The arguments of part 3(e) show that the condition Eb(η1) < 1 is necessary. By Jensen’sinequality, E log b(η1) ≤ logEb(η1). The condition Eb(η1) < 1 is thus sufficient for theexistence of a strictly stationary solution of the form

σt = ω {1+ b(ηt )+ b(ηt )b(ηt−1)+ · · ·} .

By Beppo Levi’s theorem, this solution satisfies

Eσt = ω

1− Eb(η1)<∞.

(c) Let θ∗ = (ω∗, α∗+, α∗−, c∗, β∗) and consider the case c∗ ≤ c. The case c∗ ≥ c is handledsimilarly. Denote by Rt any measurable function with respect to σ {εu, u ≤ t}. We have

σt = α+σt−1(ηt−1 − c) 1l{ηt−1 >c} −α−σt−1(ηt−1 − c) 1l{ηt−1<c} +Rt−2,

and, writing σ ∗t = σt (θ∗),

σ ∗t = α∗+(σt−1ηt−1 − c∗σ ∗t−1) 1l{σt−1ηt−1 >c∗σ∗t−1}

−α∗−(σt−1ηt−1 − c∗σ ∗t−1) 1l{σt−1ηt−1<c∗σ∗

t−1} +Rt−2.

Page 487: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

PROBLEMS 471

Assume that σt = σ ∗t almost surely (for any t). Then we have

ηt−1(α+ − α∗+) 1l{ηt−1 >c} = Rt−2,

ηt−1(−α− − α∗+) 1l{c∗<ηt−1<c} = Rt−2,

ηt−1(−α− + α∗−) 1l{ηt−1<c∗} = Rt−2.

This implies that α+ = α∗+, α− = α∗−, and α∗+ = −α− if c∗ < c. Since α∗+ = −α−, we havec = c∗, α+ = α∗+ and α− = α∗−. We then have ω − ω∗ + (β − β∗)ηt−1Rt−2 = 0 a.s., whichentails ω = ω∗ and β = β∗.

(d) One could estimate θ by the quasi-maximum likelihood method, that is, with the aid of theestimator

θn = arg minθ∈�

n∑t=1

{ε2t

σ 2t (θ)

+ log σ 2t (θ)

},

where � is a compact parameter space which constrains the parameters to be positive, andσ 2t (θ) is defined recursively by

σt (θ) ={ω + α+εt−1 + (β − cα+)σt−1(θ) if εt−1 ≥ cσt−1(θ)

ω − α−εt−1 + (β + cα−)σt−1(θ) if εt−1 ≤ cσt−1(θ),

with for instance σ0(θ) = ω as initial value.

Page 488: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA
Page 489: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

References

Abramson, A. and Cohen, I. (2008) Single-sensor audio source separation using classification and estima-tion approach and GARCH modelling. IEEE Transactions on Audio Speech and Language Processing 16,1528–1540.

Alexander, C. (2002) Principal component models for generating large GARCH covariance matrices. EconomicNotes 31, 337–359.

Andersen, T.G. (1994) Stochastic autoregressive volatility: a framework for volatility modeling. MathematicalFinance 4, 103–119.

Andersen, T.G. and Bollerslev, T. (1998) Answering the skeptics: yes, standard volatility models do provideaccurate forecasts. International Economic Review 39, 885–906.

Andersen, T.G., Bollerslev, T., Diebold, F.X. and Labys, P. (2003) Modeling and forecasting realized volatility.Econometrica 71, 579–625.

Andersen, T.G., Davis, R.A., Kreiss, J.-P. and Mikosch, T. (eds) (2009) Handbook of Financial Time Series .Berlin: Springer.

Andrews, B. (2009) Rank based estimation for GARCH processes. Discussion Paper, Colorado StateUniversity.

Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance matrix estimation.Econometrica 59, 817–858.

Andrews, D.W.K. (1997) Estimation when a parameter is on a boundary of the parameter space: part II.Unpublished manuscript, Yale University.

Andrews, D.W.K. (1999) Estimation when a parameter is on a boundary. Econometrica 67, 1341–1384.

Andrews, D.W.K. (2001) Testing when a parameter is on a boundary of the maintained hypothesis. Econo-metrica 69, 683–734.

Andrews, D.W.K. and Monahan, J.C. (1992) An improved heteroskedasticity and autocorrelation consistentcovariance matrix estimator. Econometrica 60, 953–966.

Ango Nze, P. and Doukhan, P. (2004) Weak dependence models and applications to econometrics. EconometricTheory 20, 995–1045.

Ardia, D. (2008) Financial Risk Management with Bayesian Estimation of GARCH Models: Theory and Appli-cations . Berlin: Springer.

Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1999) Coherent measures of risk. Mathematical Finance9, 203–228.

Aue, A., Hormann, S., Horvath, L. and Reimherr, M. (2009) Break detection in the covariance structure ofmultivariate time series models. Annals of Statistics 37, 4046–4087.

Avramov, D., Chordia, T. and A. Goyal. (2006) The impact of trades on daily volatility. Review of FinancialStudies 19, 1241–1277.

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Page 490: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

474 REFERENCES

Awartani, B.M.A. and Corradi, V. (2005) Predicting the volatility of the S&P-500 stock index via GARCHmodels: the role of asymmetries. International Journal of Forecasting 21, 167–183.

Azencott, R. and Dacunha-Castelle, D. (1984) Series d’observations irregulieres . Paris: Masson.

Bachelier, L. (1900) Theorie de la speculation. Annales Scientifiques de l’Ecole Normale Superieure 17, 21–86.

Bahadur, R.R. (1960) Stochastic comparison of tests. Annals of Mathematical Statistics 31, 276–295.

Bahadur, R.R. (1967) Rates of convergence of estimates and test statistics. Annals of Mathematical Statistics38, 303–324.

Bai, X., Russell, J.R. and Tiao, G.C. (2004) Kurtosis of GARCH and stochastic volatility models with non-normal innovations. Journal of Econometrics 114, 349–360.

Baillie, R.T., Bollerslev, T. and Mikkelsen, H.O. (1996) Fractionally integrated generalized autoregressiveconditional heteroscedasticity. Journal of Econometrics 74, 3–30.

Bardet, J.-M. and Wintenberger, O. (2009) Asymptotic normality of the quasi maximum likelihood estimatorfor multidimensional causal processes. Annals of Statistics 37, 2730–2759.

Barndorff-Nielsen, O.E. and Shephard, N. (2002) Econometric analysis of realized volatility and its use instochastic volatility models. Journal of the Royal Statistical Society B 64, 253–280.

Bartholomew, D.J. (1959) A test of homogeneity of ordered alternatives. Biometrika 46, 36–48.

Basrak, B., Davis, R.A. and Mikosch, T. (2002) Regular variation of GARCH processes. Stochastic Processesand their Applications 99, 95–115.

Bauwens, L., Laurent, S. and Rombouts, J.V.K. (2006) Multivariate GARCH models: a survey. Journal ofApplied Econometrics 21, 79–109.

Beguin, J.M., Gourieroux, C. and Monfort, A. (1980) Identification of a mixed autoregressive-moving averageprocess: the corner method. In O.D. Anderson (ed.), Time Series , pp. 423–436. Amsterdam: North-Holland.

Bekaert, G. and Wu, G. (2000) Asymmetric volatility and risk in equity markets. Review of Financial Studies13, 1–42.

Bera, A.K. and Higgins, M.L. (1997) ARCH and bilinearity as competing models for nonlinear dependence.Journal of Business and Economic Statistics 15, 43–50.

Beran, J. and Schutzner M. (2009) On approximate pseudo-maximum likelihood estimation for LARCH-processes. Bernoulli , 15, 1057–1081.

Berk, K.N. (1974) Consistent autoregressive spectral estimates. Annals of Statistics 2, 489–502.

Berkes, I. and Horvath, L. (2003a) Asymptotic results for long memory LARCH sequences. Annals of AppliedProbability 13, 641–668.

Berkes, I. and Horvath, L. (2003b) The rate of consistency of the quasi-maximum likelihood estimator. Statisticsand Probability Letters 61, 133–143.

Berkes, I. and Horvath, L. (2004) The efficiency of the estimators of the parameters in GARCH processes.Annals of Statistics 32, 633–655.

Berkes, I., Horvath, L. and Kokoszka, P. (2003a) Asymptotics for GARCH squared residual correlations.Econometric Theory 19, 515–540.

Berkes, I., Horvath, L. and Kokoszka, P. (2003b) GARCH processes: structure and estimation. Bernoulli 9,201–227.

Berlinet, A. (1984) Estimating the degrees of an ARMA model. Compstat Lectures 3, 61–94.

Berlinet, A. and Francq, C. (1997) On Bartlett’s formula for nonlinear processes. Journal of Time SeriesAnalysis 18, 535–555.

Bertholon, H., Monfort, A. and Pegoraro, F. (2008) Econometric asset pricing modelling. Journal of FinancialEconometrics 6, 407–458.

Bickel, P.J. (1982) On adaptative estimation. Annals of Statistics 10, 647–671.

Billingsley, P. (1961) The Lindeberg-Levy theorem for martingales. Proceedings of the American MathematicalSociety 12, 788–792.

Billingsley, P. (1995) Probability and Measure. New York: John Wiley & Sons, Inc.

Black, F. (1976) Studies of stock price volatility changes. Proceedings from the American Statistical Associa-tion, Business and Economic Statistics Section 177–181.

Black, F. and Scholes, M. (1973) The pricing of options and corporate liabilities. Journal of Political Economy3, 637–654.

Page 491: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

REFERENCES 475

Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,307–327.

Bollerslev, T. (1988) On the correlation structure for the generalized autoregressive conditional heteroskedasticprocess. Journal of Time Series Analysis 9, 121–131.

Bollerslev, T. (1990) Modelling the coherence in short-run nominal exchange rates: a multivariate generalizedARCH model. Review of Economics and Statistics 72, 498–505.

Bollerslev, T. (2008) Glossary to ARCH (GARCH). In T. Bollerslev, J.R. Russell and M. Watson (eds),Volatility and Time Series Econometrics: Essays in Honor of Robert F. Engle. Oxford: Oxford UniversityPress.

Bollerslev, T., Chou, R.Y. and Kroner, K.F. (1992) ARCH modeling in finance: a review of the theory andempirical evidence. Journal of Econometrics 52, 5–59.

Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994) ARCH models. In R.F Engle and D.L. McFadden (eds),Handbook of Econometrics , Vol. IV, Chap. 49, pp. 2959–3038. Amsterdam: North-Holland.

Bollerslev, T.P., Engle, R.F. and Woolridge, J. (1988) A capital asset pricing model with time varying covari-ances. Journal of Political Economy 96, 116–131.

Borkovec, M. and Kluppelberg, C. (2001) The tail of the stationary distribution of an autoregressive processwith ARCH(1) errors. Annals of Applied Probability 11, 1220–1241.

Bose, A. and Mukherjee, K. (2003) Estimating the ARCH parameters by solving linear equations. Journal ofTime Series Analysis 24, 127–136.

Boswijk, H.P. and van der Weide, R. (2006) Wake me up before before you GO-GARCH. Discussion paper,University of Amsterdam.

Bougerol, P. and Picard, N. (1992a) Strict stationarity of generalized autoregressive processes. Annals ofProbability 20, 1714–1729.

Bougerol, P. and Picard, N. (1992b) Stationarity of GARCH processes and of some nonnegative time series.Journal of Econometrics 52, 115–127.

Boussama, F. (1998) Ergodicite, melange et estimation dans les modeles GARCH. Doctoral thesis, UniversiteParis 7.

Boussama, F. (2000) Normalite asymptotique de l’estimateur du pseudo-maximum de vraisemblance d’unmodele GARCH. Comptes Rendus de L’Academie de Sciences Paris 331, 81–84.

Boussama, F. (2006) Ergodicite des chaınes de Markov a valeurs dans une variete algebrique: application auxmodeles GARCH maultivaries. Comptes Rendus de;’Academie de Sciences Paris 343, 275–278.

Box, G.E.P. and Jenkins, G.M. (1970) Time Series Analysis: Forecasting and Control . San Francisco: Holden-Day.

Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994) Time Series Analysis: Forecasting and Control . EnglewoodCliffs, NJ: Prentice Hall.

Box, G.E.P. and Pierce, D.A. (1970) Distribution of residual autocorrelations in autoregressive-integratedmoving average time series models. Journal of the American Statistical Association 65, 1509–1526.

Bradley, R.C. (1986) Basic properties of strong mixing conditions. In E. Eberlein and M.S. Taqqu (eds),Dependence in Probability and Statistics: A Survey of Recent Results , pp. 165–192. Boston: Birkhauser.

Bradley, R.C. (2005) Basic properties of strong mixing conditions. A survey and some open questions. Prob-ability Surveys 2, 107–144.

Brandt, A. (1986) The stochastic equation Yn+1 = AnYn + Bn with stationary coefficients. Advance in AppliedProbability 18, 221–254.

Breidt, F.J., Davis, R.A. and Trindade, A. (2001) Least absolute deviation estimation for all-pass time seriesmodels. Annals of Statistics 29, 919–946.

Brockwell, P.J. and Davis, R.A. (1991) Time Series: Theory and Methods , 2nd edition. New York: Springer.

Cai, J. (1994) A Markov model of switching-regime ARCH. Journal of Business and Economic Statistics 12,309–316.

Campbell, J.Y. and Hentschel, L. (1992) No news is good news: an asymmetric model of changing volatilityin stock returns, Journal of Financial Economics 31, 281–318.

Campbell, S.D. and Diebold, F.X. (2005) Weather forecasting for weather derivatives. Journal of the AmericanStatistical Association 100, 6–16.

Page 492: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

476 REFERENCES

Cappe, O., Moulines, E. and Ryden, T. (2005) Inference in Hidden Markov Models Series . New York: Springer.

Carrasco, M. and Chen, X. (2002) Mixing and moment properties of various GARCH and stochastic volatilitymodels. Econometric Theory 18, 17–39.

Chan, K.S. (1990) Deterministic stability, stochastic stability, and ergodicity. In H. Tong (ed.), Nonlinear TimeSeries: A Dynamical system Approach , pp. 448–466. Oxford: Oxford University Press.

Chen, M. and An, H.Z. (1998) A note on the stationarity and the existence of moments of the GARCH model.Statistica Sinica 8, 505–510.

Chen, C.W.S., Gerlach, R. and So, M.K.P. (2006) Comparison of nonnested asymmetric heteroskedastic models.Computational Statistics and Data Analysis 51, 2164–2178.

Chernoff, H. (1954) On the distribution of the likelihood ratio. Annals of Mathematical Statistics 54, 573–578.

Chow, Y.S. and Teicher, H. (1997) Probability Theory , 3rd edition. New York: Springer.

Christie, A.A. (1982) The stochastic behavior of common stock variances: value, leverage and interest rateeffects. Journal of Financial Economics 10, 407–432.

Christoffersen, P.F. (1998) Evaluating interval forecasts. International Economic Review 39, 841–862.

Christoffersen, P.F. and Jacobs, K. (2004) Which GARCH model for option valuation? Management Science50, 1204–1221.

Christoffersen, P.F. and Pelletier, D. (2004) Backtesting value-at-risk: a duration-based approach. Journal ofFinancial Econometrics 2, 84–108.

Clark, P.K. (1973) A subordinated stochastic process with fixed variance for speculative prices. Econometrica41, 135–155.

Cline, D.H.B. and Pu, H.H. (1998) Verifying irreducibility and continuity of a nonlinear time series. Statisticsand Probability Letters 40, 139–148.

Cochrane, J. (2001) Asset Pricing . Princeton, NJ: Princeton University Press.

Cohen, I. (2004) Modelling speech signal in the time-frequency domain using GARCH. Signal Processing 84,2453–2459.

Cohen, I. (2006) Speech spectral modeling and enhancement based on autoregressive conditional heteroscedas-ticity models. Signal Processing 86, 698–709.

Comte, F. and Lieberman, O. Asymptotic theory for multivariate GARCH processes. Journal of MultivariateAnalysis , 84, 61–84.

Corradi, V. (2000) Reconsidering the continuous time limit of the GARCH(1, 1) process. Journal of Econo-metrics 96, 145–153.

Cox, J., Ross, S. and Rubinstein, M. (1979) Option pricing: a simplified approach. Journal of FinancialEconomics 7, 229–263.

Dana, R.A. and Jeanblanc-Picque, M. (1994) Marches financiers en temps continu. Valoriation et equilibre.Paris: Economica.

Davidian, M. and Carroll, R.J. (1987) Variance function estimation. Journal of the American Statistical Asso-ciation 82, 1079–1091.

Davis, R.A., Knight, K. and Liu, J. (1992) M-estimation for autoregressions with infinite variance. StochasticProcesses and their Applications 40, 145–180.

Davis, R.A. and Mikosch, T. (2009) Extreme value theory for GARCH processes. In T.G. Andersen, R.A.Davis, J.-P. Kreiss and T. Mikosch (eds), Handbook of Financial Time Series . Berlin: Springer.

Davydov, Y.A. (1968) Convergence of distributions generated by stationary stochastic processes. Theory ofProbability and Applications 13, 691–696.

Davydov, Y.A. (1973) Mixing conditions for Markov chains. Theory of Probability and Applications 18,313–328.

Delbaen, F. (2002) Coherent measures of risk on general probability spaces. In K. Sandmann and P.J.Schonbucher (eds), Advances in Finance and Stochastics: Essays in Honor of Dieter Sondermann, pp. 1–37.Berlin: Springer.

Demos, A. and Sentana, E. (1998) Testing for GARCH effects: a one-sided approach. Journal of Econometrics86, 97–127.

den Hann, W.J. and Levin, A. (1997) A practitioner’s guide to robust covariance matrix estimation. In C.R.Rao and G.S. Maddala (eds), Handbook of Statistics 15, pp. 291–341. Amsterdam: North-Holland.

Page 493: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

REFERENCES 477

Diebold, F.X. (1986) Testing for serial correlation in the presence of ARCH. Proceedings of the AmericanStatistical Association, Business and Economics Statistics Section , 323–328.

Diebold, F.X. (2004) The nobel memorial prize for Robert F. Engle. Scandinavian Journal of Economics 106,165–185.

Diebolt, J. and Guegan, D. (1991) Le modele de serie chronologique autoregressive β-ARCH. Comptes Rendusde l’Academie des Sciences de Paris 312, 625–630.

Ding, Z. and Engle, R.F. (2001) Large scale conditional covariance matrix modeling, estimation and testing.Academia Economic Papers 29, 157–184.

Ding, Z., Granger, C. and Engle, R.F. (1993) A long memory property of stock market returns and a newmodel. Journal of Empirical Finance 1, 83–106.

Douc, R., Roueff, F. and Soulier, P. (2008) On the existence of some ARCH(∞) processes. Stochastic Processesand their Applications 118, 755–761.

Doukhan, P. (1994) Mixing: Properties and Examples , Lecture Notes in Statistics 85. New York: Springer-Verlag.

Doukhan, P., Teyssiere, G. and Winant, P. (2006) Vector valued ARCH infinity processes. In P. Bertail, P.Doukhan and P. Soulier (eds), Dependence in Probability and Statistics. New York: Springer.

Drost, F.C. and Klaassen, C.A.J. (1997) Efficient estimation in semiparametric GARCH models. Journal ofEconometrics 81, 193–221.

Drost, F.C., Klaassen, C.A.J. and Werker, B.J.M. (1997) Adaptive estimation in time series models. Annals ofStatistics 25, 786–818.

Drost, F.C. and Nijman, T.E. (1993) Temporal aggregation of GARCH processes. Econometrica 61, 909–927.

Drost, F.C., Nijman, T.E. and Werker, B.J.M. (1998) Estimation and testing in models containing both jumpsand conditional heteroskedasticity. Journal of Business and Economic Statistics 16, 237–243.

Drost, F.C. and Werker, B.J.M. (1996) Closing the GARCH gap: continuous time GARCH modeling. Journalof Econometrics 74, 31–57.

Duan, J.-C. (1995) The GARCH option pricing model. Mathematical Finance 5, 13–32.

Duan, J.-C. and Simonato, J.-G. (2001) American option pricing under GARCH by a Markov chain approxi-mation. Journal of Economic Dynamics and Control 25, 1689–1718.

Duchesne, P. and Lalancette, S. (2003) On testing for multivariate ARCH effects in vector time series models.Revue Canadienne de Statistique 31, 275–292.

Dueker, M.J. (1997) Markov switching in GARCH processes and mean-reverting stock market volatility.Journal of Business and Economic Statistics 15, 26–34.

Duffie, D. (1994) Modeles dynamiques d’evaluation . Paris: Presses Universitaires de France.

Dufour, J.-M., Khalaf, L., Bernard, J.-T. and Genest, I. (2004) Simulation-based finite-sample tests for het-eroskedasticity and ARCH effects. Journal of Econometrics 122, 317–347.

El Babsiri, M. and Zakoıan, J.-M. (1990) Approximation en temps continu d’un modele ARCH a seuil.Document de travail 9011, INSEE.

El Babsiri, M. and Zakoıan, J.-M. (2001) Contemporaneous asymmetry in GARCH processes. Journal ofEconometrics 101, 257–294.

Elie, L. (1994) Les processus ARCH comme approximations de processus en temps continu. In F. Droesbeke,F. Fichet and P. Tassi (eds), Modelisation ARCH: theorie statistique et applications dans le domaine de lafinance. Paris: Ellipses.

Engle, R.F. (1982) Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. infla-tion. Econometrica 50, 987–1008.

Engle, R.F. (1984) Wald, likelihood ratio, and Lagrange multiplier tests in econometrics. In Z. Griliches andM.D. Intriligator (eds), Handbook of Econometrics 2, pp. 775–826. Amsterdam: North-Holland.

Engle, R.F. (2001) GARCH 101: the use of ARCH/GARCH models in applied econometrics. Journal ofEconomic Perspectives 15, 157–68.

Engle, R.F. (2002a) Dynamic conditional correlation – a simple class of multivariate GARCH models. Journalof Business and Economic Statistics 20, 339–350.

Engle, R.F. (2002b) New frontiers for ARCH models. Journal of Applied Econometrics 17, 425–446.

Page 494: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

478 REFERENCES

Engle, R.F. (2004) Nobel lecture. Risk and volatility: econometric models and financial practice. AmericanEconomic Review 94, 405–420.

Engle, R.F. (2009) Anticipating Correlations. A New Paradigm for Risk Management . Princeton, NJ: PrincetonUniversity Press.

Engle, R.F. and Bollerslev, T. (1986) Modelling the persistence of conditional variances (with comments anda reply by the authors). Econometric Reviews 5, 1–87.

Engle, R.F. and Gonzalez-Rivera, G. (1991) Semiparametric ARCH models. Journal of Business and Econo-metric Statistics 9, 345–359.

Engle, R.F., Granger, C.W.J. and Kraft, D. (1984) Combining competing forecasts of inflation based on abivariate ARCH model. Journal of Economic Dynamics and Control 8, 151–165.

Engle, R.F. and Kroner, K. (1995) Multivariate simultaneous GARCH. Econometric Theory 11, 122–150.

Engle, R.F., Lilien, D.M. and Robins, R.P. (1987) Estimating time varying risk premia in the term structure:the ARCH-M model. Econometrica 55, 391–407.

Engle, R.F. and Manganelli, S. (2004) CAViaR: Conditional autoregressive value at risk by regression quantiles.Journal of Business and Economic Statistics 22, 367–381.

Engle, R.F. and Mustafa, C. (1992) Implied ARCH models from option prices. Journal of Econometrics 52,289–311.

Engle, R.F. and Ng, V.K. (1993) Measuring and testing the impact of news on volatility. Journal of Finance48, 1749–1778.

Engle, R.F., Ng, V.K. and Rothschild, M. (1990) Asset pricing with a factor ARCH covariance structure:empirical estimates for treasury bills. Journal of Econometrics 45, 213–238.

Engle, R.F. and Patton, A. (2001) What good is a volatility model? Quantitative Finance 1, 237–245.

Engle, R.F. and Rangel, A. (2008) The spline-GARCH model for low-frequency volatility and its globalmacroeconomic causes. Review of Financial Studies 21, 1187–1222.

Escanciano, J.C. (2009) Quasi-maximum likelihood estimation of semi-strong GARCH models. EconometricTheory 25, 561–570.

Escanciano, J.C. and Olmo, J. (2010) Backtesting parametric value-at-risk with estimation risk. Journal ofBusiness and Economics Statistics , 28(1), 36–51.

Ewing, B.T., Kruse, J.B. and Schroeder, J.L. (2006) Time series analysis of wind speed with time-varyingturbulence. Environmetrics 17, 119–127.

Fama, E.F. (1965) The behavior of stock market prices. Journal of Business 38, 34–105.

Feigin, P.D. and Tweedie, R.L. (1985) Random coefficient autoregressive processes: a Markov chain analysisof stationarity and finiteness of moments. Journal of Time Series Analysis 6, 1–14.

Francq, C., Horvath, L. and Zakoıan, J.-M. (2009) Merits and drawbacks of variance targeting in GARCHmodels. Preprint MPRA 15143, University Library of Munich, Germany.

Francq, C., Roussignol, M. and Zakoıan, J.-M. (2001) Conditional heteroskedasticity driven by hidden Markovchains. Journal of Time Series Analysis 22, 197–220.

Francq, C., Roy, R. and Zakoıan, J.-M. (2005) Diagnostic checking in ARMA models with uncorrelated errors.Journal of the American Statistical Association 100, 532–544.

Francq, C. and Zakoıan, J.-M. (2000) Estimating weak GARCH representations. Econometric Theory 16,692–728.

Francq, C. and Zakoıan, J.-M. (2004) Maximum likelihood estimation of pure GARCH and ARMA-GARCHprocesses. Bernoulli 10, 605–637.

Francq , C. and Zakoıan, J.-M. (2005) The L2-structures of standard and switching-regime GARCH models.Stochastic Processes and their Applications 115, 1557–1582.

Francq , C. and Zakoıan, J.-M. (2006a) Mixing properties of a general class of GARCH(1, 1) models withoutmoment assumptions on the observed process. Econometric Theory 22, 815–834.

Francq, C. and Zakoıan, J.-M. (2006b) On efficient inference in GARCH processes. In P. Bertail, P. Doukhanand P. Soulier (eds), Statistics for Dependent Data , pp. 305–327. New York: Springer.

Francq, C. and Zakoıan, J.-M. (2007) Quasi-maximum likelihood estimation in GARCH processes when somecoefficients are equal to zero. Stochastic Processes and their Applications 117, 1265–1284.

Page 495: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

REFERENCES 479

Francq, C. and Zakoıan, J.-M. (2008) Deriving the autocovariances of powers of Markov-switchingGARCH models, with applications to statistical inference. Computational Statistics and Data Analysis 52,3027–3046.

Francq, C. and Zakoıan, J.-M. (2009a) A tour in the asymptotic theory of GARCH estimation. In T.G. Andersen,R.A. Davis, J.-P. Kreiss and T. Mikosch (eds), Handbook of Financial Time Series . Berlin: Springer.

Francq, C. and Zakoıan, J.-M. (2009b) Testing the nullity of GARCH coefficients: correction of the standardtests and relative efficiency comparisons. Journal of the American Statistical Association 104, 313–324.

Francq, C. and Zakoıan, J.-M. (2009c) Inconsistency of the QMLE and asymptotic normality of the weightedLSE for a class of conditionally heteroscedastic models. Preprint MPRA 15147, University Library ofMunich, Germany.

Francq, C. and Zakoıan, J.-M. (2009d) Bartlett’s formula for a general class of nonlinear processes. Journalof Time Series Analysis 30, 449–465.

Franke, J., Hardle, W. and Hafner, C. (2004) Statistics of Financial Markets . Berlin: Springer.

Franses, P.H. and van Dijk, D. (2000) Non-linear Time Series Models in Empirical Finance. Cambridge:Cambridge University Press.

Fruhwirth-Schnatter, S. (2006) Finite Mixture and Markov Switching Models . New York: Springer.

Fryzlewicz, P. and Subba Rao, S. (2008) On some mixing properties of ARCH and time-varying ARCHprocesses. Technical report, Department of Statistics at Texas A&M University.

Garcia, R., Ghysels, E. and Renault, E. (1998) A note on hedging in ARCH and stochastic volatility optionpricing models. Mathematical Finance 8, 153–161.

Geweke, J. (1989) Exact predictive densities for linear models with ARCH disturbances. Journal of Econo-metrics 40, 63–86.

Giraitis, L., Kokoszka, P. and Leipus, R. (2000) Stationary ARCH models: dependence structure and centrallimit theorem. Econometric Theory 16, 3–22.

Giraitis, L., Leipus, R., Robinson, P.M. and Surgailis, D. (2004) LARCH, leverage, and long memory. Journalof Financial Econometrics 2, 177–210.

Giraitis, L., Leipus, R. and Surgailis, D. (2006) Recent advances in ARCH modelling. In A. Kirman and G.Teyssiere (eds), Long-Memory in Economics , pp. 3–38. Berlin: Springer.

Giraitis, L., Leipus, R. and Surgailis, D. (2009) ARCH(∞) and long memory properties. In T.G. Andersen,R.A. Davis, J.-P. Kreiss and T. Mikosch (eds), Handbook of Financial Time Series . Berlin: Springer.

Giraitis, L. and Robinson, P.M. (2001) Whittle estimation of ARCH models. Econometric Theory 17, 608–631.

Giraitis, L., Robinson, P.M. and Surgailis, D. (2000) A model for long memory conditional heteroscedasticity.Annals of Applied Probability 10, 1002–1024.

Giraitis, L. and Surgailis, D. (2002) ARCH-type bilinear models with double long memory. Stochastic Processesand their Applications 100, 275–300.

Glasbey, C.A. (1982) A generalization of partial autocorrelation function useful in identifying ARMA models.Technometrics 24, 223–228.

Glosten, L.R., Jagannathan, R. and Runkle, D. (1993) On the relation between the expected values and thevolatility of the nominal excess return on stocks. Journal of Finance 48, 1779–1801.

Godfrey, L.G. (1988) Misspecification Tests in Econometrics: The Lagrange Multiplier Principle and OtherApproaches . Cambridge: Cambridge University Press.

Goldsheid, I.Y. (1991) Lyapunov exponents and asymptotic behavior of the product of random matrices. In L.Arnold, H. Crauel and J.-P. Eckmann (eds), Lyapunov Exponents. Proceedings of the Second Conference heldin Oberwolfach, May 28–June 2, 1990 , Lecture Notes in Mathematics 1486, pp. 23–37. Berlin: Springer.

Goncalves, E. and Mendes Lopes, N. (1994) The generalized threshold ARCH model: wide sense stationarityand asymptotic normality of the temporal aggregate. Publications de l’Institut de Statistique de Paris 38,19–35.

Goncalves, E. and Mendes Lopes, N. (1996) Stationarity of GTARCH processes. Statistics 28, 171–178.

Gonzalez-Rivera, G. (1998) Smooth transition GARCH models. Studies in Nonlinear Dynamics and Econo-metrics 3, 61–78.

Gonzalez-Rivera, G. and Drost, F.C. (1999) Efficiency comparisons of maximum-likelihood-based estimatorsin GARCH models. Journal of Econometrics 93, 93–111.

Page 496: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

480 REFERENCES

Gourieroux, C. (1997) ARCH Models and Financial Applications . New York: Springer.

Gourieroux, C., Holly, A. and Monfort, A. (1982) Likelihood ratio test, Wald test and Kuhn-Tucker test inlinear models with inequality constraints on the regression parameters. Econometrica 50, 63–79.

Gourieroux, C. and Jasiak, J. (2001) Financial Econometrics . Princeton, NJ: Princeton University Press.

Gourieroux, C. and Jasiak, J. (2008) Dynamic quantile models. Journal of Econometrics 147, 198–205.

Gourieroux, C. and Monfort, A. (1992) Qualitative threshold ARCH models. Journal of Econometrics 52,159–200.

Gourieroux, C. and Monfort, A. (1995) Statistics and Econometric Models . Cambridge: Cambridge UniversityPress.

Gourieroux, C. and Monfort, A. (1996) Time Series and Dynamic Models . Cambridge: Cambridge UniversityPress.

Gourieroux, C., Monfort, A. and Renault, E. (1993) Indirect inference. Journal of Applied Econometrics 8,S85–S118.

Gourieroux, C. and Tiomo, A. (2007) Risque de credit . Paris: Economica.

Hafner, C. (2003) Fourth moment structure of multivariate GARCH models. Journal of Financial Econometrics1, 26–54.

Hafner, C. and Linton, O.B. (2010) Efficient estimation of a multivariate multiplicative volatility model. Journalof Econometrics , doi:10.1016/j.jeconom.2010.04.007.

Hafner, C. and Preminger, A. (2009a) Asymptotic theory for a factor GARCH model. Econometric Theory25, 336–363.

Hafner, C. and Preminger, A. (2009b) On asymptotic theory for multivariate GARCH models. Journal ofMultivariate Analysis 100, 2044–2054.

Hagerud, G.E. (1997) A New Non-linear GARCH Model . PhD thesis, IFE, Stockholm School of Economics.

Hall, P. and Yao, Q. (2003) Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica71, 285–317.

Hamilton, J.D. (1989) A new approach to the economic analysis of nonstationary time series and the businesscycle. Econometrica 57, 357–384.

Hamilton, J.D. (1994) Time Series Analysis. Princeton, NJ: Princeton University Press.

Hamilton, J.D. and R. Susmel (1994) Autoregressive conditional heteroskedasticity and changes in regime.Journal of Econometrics 64, 307–333.

Hannan, E.J. (1970) Multiple Time Series . New York: John Wiley & Sons, Inc.

Hannan, E.J. (1976) The identification and parametrization of ARMAX and state space forms. Econometrica44, 713–723.

Hannan, E.J. and Deistler, M. (1988) The Statistical Theory of Linear Systems . New York: John Wiley& Sons,Inc.

Hansen, B.E. (1994) Autoregressive conditional density estimation. International Economic Review 35,705–730.

Hansen, L.P. and. Richard, S.F (1987) The role of conditioning information in deducing testable restrictionsimplied by dynamic asset pricing models. Econometrica 55, 587–613.

Hansen, P.R. and Lunde, A. (2005) A forecast comparison of volatility models: does anything beat aGARCH(1, 1)? Journal of Applied Econometrics 20, 873–889.

Hardle, W. and Hafner, C. (2000) Discrete time option pricing with flexible volatility estimation. Finance andStochastics 4, 189–207.

Harvey, A.C., Ruiz, E. and Sentana, E. (1992) Unobserved component time series models with ARCH distur-bances. Journal of Econometrics 52, 129–158.

Harville, D. (1997) Matrix Algebra from a Statistician’s Perspective. New York: Springer.

He, C. and Terasvirta, T. (1999) Fourth-moment structure of the GARCH(p, q) process. Econometric Theory15, 824–846.

He, C. and Terasvirta, T. (2004) An extended constant conditional correlation GARCH model and its fourth-moment structure. Econometric Theory 20, 904–926.

Hentschel, L. (1995) All in the family: nesting symmetric and asymmetric GARCH models. Journal of Finan-cial Economics 39, 71–104.

Page 497: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

REFERENCES 481

Herrndorf, N. (1984) A functional central limit theorem for weakly dependent sequences of random variables.Annals of Probability 12, 141–153.

Heston, S. and Nandi, S. (2000) A closed form GARCH option pricing model. Review of Financial Studies13, 585–625.

Higgins, M.L. and Bera, A.K. (1992) A class of nonlinear ARCH models. International Economic Review 33,137–158.

Hobson, D.G. and Rogers, L.C.G. (1998) Complete models with stochastic volatility. Mathematical Finance8, 27–48.

Hong, Y. (1997) One sided testing for autoregressive conditional heteroskedasticity in time series models.Journal of Time Series Analysis 18, 253–277.

Hong, Y. and Lee, Y.J. (2001) One-sided testing for ARCH effects using wavelets. Econometric Theory 17,1051–1081.

Hormann, S. (2008) Augmented GARCH sequences: dependence structure and asymptotics. Bernoulli 14,543–561.

Horvath, L., Kokoszka, P. and Teyssiere, G. (2004) Bootstrap misspecification tests for ARCH based on theempirical process of squared residuals. Journal of Statistical Computation and Simulation 74, 469–485.

Horvath, L. and Liese, F. (2004) Lp-estimators in ARCH models. Journal of Statistical Planning and Inference119, 277–309.

Horvath, L. and Zitikis, R. (2006) Testing goodness of fit based on densities of GARCH innovations. Econo-metric Theory 22, 457–482.

Hoti, S., McAleer, M. and Chan, F. (2005) Modelling the spillover effects in the volatility of atmosphericcarbon dioxide concentrations. Mathematics and Computers in Simulation 69, 46–56.

Hsieh, K.C. and Ritchken, P. (2005) An empirical comparison of GARCH option pricing models. Review ofDerivatives Research 8, 129–150.

Huang, H.-H., Shiu, Y.-M. and Liu, P.-S. (2008) HDD and CDD option pricing with market price of weatherrisk for Taiwan. Journal of Futures Markets 28, 790–814.

Hull , J. and White, A. (1987) The pricing of options on assets with stochastic volatilities. Journal of Finance42, 281–300.

Hurlin , C. and Tokpavi, S. (2006) Backtesting value-at-risk accuracy: a simple new test. Journal of Risk 9,19–37.

Hwang, S.Y. and Kim, T.Y. (2004) Power transformation and threshold modeling for ARCH innovations withapplications to tests for ARCH structure. Stochastic Processes and Their Applications 110, 295–314.

Jeantheau, T. (1998) Strong consistency of estimators for multivariate ARCH models. Econometric Theory 14,70–86.

Jeantheau, T. (2004) A link between complete models with stochastic volatility and ARCH models. Financeand Stochastics 8, 111–131.

Jensen, S.T. and Rahbek, A. (2004a) Asymptotic normality of the QMLE estimator of ARCH in the nonsta-tionary case. Econometrica 72, 641–646.

Jensen, S.T. and Rahbek, A. (2004b) Asymptotic inference for nonstationary GARCH. Econometric Theory20, 1203–1226.

Jordan, H. (2003) Asymptotic properties of ARCH (p) quasi maximum likelihood estimators under weak con-ditions . PhD thesis, University of Vienna.

Kahane , J.-P. (1998) Le mouvement brownien, un essai sur les origines de la theorie mathematique. Seminaireset Congres SMF , 123–155.

Karanasos, M. (1999) The second moment and the autocovariance function of the squared errors of the GARCHmodel. Journal of Econometrics 90, 63–76.

Karatzas, I. and Shreve, S.E. (1988) Brownian Motion and Stochastic Calculus . New York: Springer.

Kazakevicius, V. and Leipus, R. (2002) On stationarity in the ARCH(∞) model. Econometric Theory 18,1–16.

Kazakevicius, V. and Leipus, R. (2003) A new theorem on existence of invariant distributions with applicationsto ARCH processes. Journal of Applied Probability 40, 147–162.

Page 498: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

482 REFERENCES

Kazakevicius, V. and Leipus, R. (2007) On the uniqueness of ARCH processes. Lithuanian MathematicalJournal 47, 53–57.

Kesten, H. and Spitzer, F. (1984) Convergence in distribution for products of random matrices. Zeitschrift furWahrscheinlichkeitstheorie und Verwandte Gebiete 67, 363–386.

Kim, S., Shephard, N. and Chib, S. (1998) Stochastic volatility: likelihood inference and comparison withARCH models. Review of Economic Studies 65, 361–396.

King, M., Sentana, E. and Wadhwani, S. (1994) Volatility and links between national stock markets. Econo-metrica 62, 901–933.

King, M.L. and Wu, P.X. (1997) Locally optimal one-sided tests for multiparameter hypotheses. EconometricReviews 16, 131–156.

Kingman, J.F.C. (1973) Subadditive ergodic theory. Annals of Probability 6, 883–909.

Koenker, R. and Xiao, Z. (2006) Quantile autoregression. Journal of the American Statistical Society 101,980–990.

Kokoszka, P.S. and Taqqu, M.S. (1996) Parameter estimation for infinite variance fractional ARIMA. Annalsof Statistics 24, 1880–1913.

Kluppelberg, C., Lindner, A. and Maller, R. (2004) A continuous time GARCH process driven by a Levyprocess: stationarity and second order behaviour. Journal of Applied Probability 41, 601–622.

Lanne, M. and Saikkonen, P. (2006) Why is it so difficult to uncover the risk-return tradeoff in stock returns?Economic Letters 92, 118–125.

Lanne, M. and Saikkonen, P. (2007) A multivariate generalized orthogonal factor GARCH model. Journal ofBusiness and Economic Statistics 25, 61–75.

Laurent, S. (2009) GARCH 6: Estimating and Forecasting ARCH Models . Timberlake Consultants Press.

Lee, J.H.H. and King, M.L. (1993) A locally most mean powerful based score test for ARCH and GARCHregression disturbances. Journal of Business and Economic Statistics 11, 17–27.

Lee, S. and Taniguchi, M. (2005) Asymptotic theory for ARCH-SM models: LAN and residual empiricalprocesses. Statistica Sinica 15, 215–234.

Lee, S.W. and Hansen, B.E. (1994) Asymptotic theory for the GARCH(1, 1) quasi-maximum likelihood esti-mator, Econometric Theory 10, 29–52.

Li, C.W. and Li, W.K. (1996) On a double-threshold autoregressive heteroscedastic time series model. Journalof Applied Econometrics 11, 253–274.

Li, W.K. (2004) Diagnostic Checks in Time Series. Boca Raton, FL: Chapman & Hall/CRC.

Li, W.K., Ling, S. and McAleer, M. (2002) Recent theoretical results for time series models with GARCHerrors. Journal of Economic Surveys 16, 245–269.

Li, W.K. and Mak, T.K. (1994) On the squared residual autocorrelations in non- linear time series withconditional heteroscedasticity. Journal of Time Series Analysis 15, 627–636.

Ling, S. (2005) Self-weighted LSE and MLE for ARMA-GARCH models. Unpublished paper, Hong-KongUniversity of Science and Technology.

Ling, S. (2007) Self-weighted and local quasi-maximum likelihood estimators for ARMA-GARCH/IGARCHmodels. Journal of Econometrics 140, 849–873.

Ling, S. and Li, W.K. (1997) On fractionally integrated autoregressive moving-average time series modelswith conditional heteroscedasticity. Journal of the American Statistical Association 92, 1184–1194.

Ling, S. and Li, W.K. (1998) Limiting distributions of maximum likelihood estimators for unstable ARMAmodels with GARCH errors. Annals of Statistics 26, 84–125.

Ling, S. and McAleer, M. (2002a) Necessary and sufficient moment conditions for the GARCH(r, s) andasymmetric power GARCH(r, s) models. Econometric Theory 18, 722–729.

Ling, S. and McAleer, M. (2002b) Stationarity and the existence of moments of a family of GARCH processes.Journal of Econometrics 106, 109–117.

Ling, S. and McAleer, M. (2003a) Asymptotic theory for a vector ARMA-GARCH model. Econometric Theory19, 280–310.

Ling, S. and McAleer, M. (2003b) Adaptative estimation in nonstationary ARMA models with GARCH errors.Annals of Statistics 31, 642–674.

Linton, O. (1993) Adaptive estimation in ARCH models. Econometric Theory 9, 539–564.

Page 499: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

REFERENCES 483

Liu J., Li, W.K. and Li, C.W. (1997) On a threshold autoregression with conditional heteroskedastic variances.Journal of Statistical Planning and Inference 62, 279–300.

Liu, J.-C. (2007) Stationarity of a Markov-switching GARCH model. Journal of Financial Econometrics 4,573–593.

Ljung, G.M. and Box, G.E.P. (1978) On the measure of lack of fit in time series models. Biometrika 65,297–303.

Longerstaey, J. (1996). RiskMetrics technical document. Technical Report fourth edition. JP Morgan/Reuters.http://www.riskmetrics.com/system/files/private/td4e.pdf (accessed April 2010).

Lumsdaine, R.L. (1996) Consistency and asymptotic normality of the quasi-maximum likelihood estimator inIGARCH(1, 1) and covariance stationary GARCH(1, 1) models. Econometrica 64, 575–596.

Lutkepohl, H. (1991) Introduction to Multiple Time Series Analysis. Berlin: Springer.

Mandelbrot, B. (1963) The variation of certain speculative prices. Journal of Business 36, 394–419.

Markowitz, H. (1952) Portfolio selection. Journal of Finance 7, 77–91.

McAleer, M. and Chan, F. (2006) Modelling trends and volatility in atmospheric carbon dioxide concentrations.Environmental Modelling and Software 21, 1273–1279.

McLeod, A.I. and Li, W.K. (1983) Diagnostic checking ARMA time series models using squared-residualautocorrelations. Journal of Time Series Analysis 4, 269–273

McNeil, A.J., Frey, R. and Embrechts, P. (2005) Quantitative Risk Management . Princeton, NJ: PrincetonUniversity Press.

Meitz, M., and Saikkonen, P. (2008a) Stability of nonlinear AR-GARCH models. Journal of Time SeriesAnalysis 29, 453–475.

Meitz, M., and Saikkonen, P. (2008b) Ergodicity, mixing, and existence of moments of a class of Markovmodels with applications to GARCH and ACD models. Econometric Theory 24, 1291–1320.

Merton, R.C. (1973) An intertemporal capital asset pricing model. Econometrica 41, 867–887.

Meyn, S.P. and Tweedie, R.L. (1996) Markov Chains and Stochastic Stability , 3rd edition. London: Springer.

Mikosch, T. (2001) Modeling financial time series. Lecture presented at ‘New Directions in Time SeriesAnalysis’, CIRM, Luminy.

Mikosch, T., Gadrich, T., Kluppelberg, C. and Adler, R.J. (1995) Parameter estimation for ARMA modelswith infinite variance innovations. Annals of Statistics 23, 305–326.

Mikosch, T. and Starica, C. (2000) Limit theory for the sample autocorrelations and extremes of a GARCH(1, 1)process. Annals of Statistics 28, 1427–1451.

Mikosch, T. and Starica, C. (2004) Non-stationarities in financial time series, the long-range dependence, andthe IGARCH effects. Review of Economics and Statistics 86, 378–390.

Mikosch, T. and Straumann, D. (2002) Whittle estimation in a heavy-tailed GARCH(1, 1) model. StochasticProcesses and their Applications 100, 187–222.

Milhøj, A. (1984) The moment structure of ARCH processes. Scandinanvian Journal of Statistics 12, 281–292.

Mills, T.C. (1993) The Econometric Modelling of Financial Time Series . Cambridge: Cambridge UniversityPress.

Mokkadem, A. (1990) Proprietes de melange des processus autoregressifs polynomiaux. Annales de l’InstitutHenri Poincare 26, 219–260.

Moulin, H. and Fogelman-Soulie, F. (1979) La convexite dans les mathematiques de la decision: optimisationet theorie micro-economique. Paris: Hermann.

Muler, N. and Yohai, V.J. (2008) Robust estimates for GARCH models. Journal of Statistical Planning andInference 138, 2918–2940.

Nelson, D.B. (1990a) Stationarity and persistence in the GARCH(1, 1) model. Econometric Theory 6,318–334.

Nelson, D.B. (1990b) ARCH models as diffusion approximations. Journal of Econometrics 45, 7–38.

Nelson, D.B. (1991) Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59,347–370.

Nelson, D.B. (1992) Filtering and forecasting with misspecified ARCH models I: Getting the right variancewith the wrong model. Journal of Econometrics 52, 61–90.

Page 500: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

484 REFERENCES

Nelson, D.B. and Cao, C.Q. (1992) Inequality constraints in the univariate GARCH model. Journal of Businessand Economic Statistics 10, 229–235.

Nelson, D.B. and Foster, D.P. (1994) Asymptotic filtering theory for univariate ARCH models. Econometrica62, 1–42.

Nelson, D.B. and Foster, D.P. (1995) Filtering and forecasting with misspecified ARCH models II: Makingthe right forecast with the wrong model. Journal of Econometrics 67, 303–335.

Nijman, T. and Sentana, E. (1996) Marginalization and contemporaneous aggregation in multivariate GARCHprocesses. Journal of Econometrics 71, 71–87.

Nummelin, E. (1984) General Irreducible Markov Chains and Non-negative Operators . Cambridge: CambridgeUniversity Press.

Pagan, A.R. (1996) The econometrics of financial markets. Journal of Empirical Finance 3, 15–102.

Pagan, A.R. and Schwert, G.W. (1990) Alternative models for conditional stock volatility. Journal of Econo-metrics 45, 267–290.

Palm, F.C. (1996) GARCH models of volatility. In G.S. Maddala and C.R. Rao (eds), Handbook of Statistics14, pp. 209–240. Amsterdam: North-Holland.

Pantula, S.G. (1989) Estimation of autoregressive models with ARCH errors. Sankhya B 50, 119–138.

Peng L. and Yao, Q. (2003) Least absolute deviations estimation for ARCH and GARCH models. Biometrika90, 967–975.

Perlman, M.D. (1969) One-sided testing problems in multivariate analysis. Annals of Mathematical Statistics40, 549–567. Corrections in Annals of mathematical Statistics (1971) 42, 1777.

Priestley, M.B. (1988) Non-linear and Non-stationary Time Series Analysis . New York: Academic Press.

Rabemananjara, R. and Zakoıan, J.-M. (1993) Threshold ARCH models and asymmetries in volatility. Journalof Applied Econometrics 8, 31–49.

Reinsel, G.C. (1997) Elements of Multivariate Time Series Analysis . New York: Springer.

Rich, R.W., Raymond, J. and Butler, J.S. (1991) Generalized instrumental variables estimation of autoregressiveconditional heteroskedastic models. Economics Letters 35, 179–185.

Rio, E. (1993) Covariance inequalities for strongly mixing processes. Annales de l’Institut Henri Poincare,Probabilites et Statistiques 29, 587–597.

Robinson, P.M. (1991) Testing for strong correlation and dynamic conditional heteroskedasticity in multipleregression. Journal of Econometrics 47, 67–84.

Robinson, P.M. and Zaffaroni, P. (2006) Pseudo-maximum likelihood estimation of ARCH(∞) models. Annalsof Statistics 34, 1049–1074.

Rogers, A.J. (1986) Modified Lagrange multiplier tests for problems with one-sided alternatives. Journal ofEconometrics 31, 341–361.

Romano, J.P. and Thombs, L.A. (1996) Inference for autocorrelations under weak assumptions. Journal of theAmerican Statistical Association 91, 590–600.

Romilly, P. (2006) Time series modelling of global mean temperature for managerial decision-making. Journalof Environmental Management 76, 61–70.

Rosenberg, B. (1972) The behavior of random variables with nonstationary variance and the distribution ofsecurity prices. Working Paper 11, Graduate School of Business Administration, University of Califor-nia, Berkeley. Also reproduced in N. Shephard (ed.), Stochastic Volatility , pp. 83–108. Oxford: OxfordUniversity Press (2005).

Rosenblatt, M. (1956) A central limit theorem and a strong mixing condition. Proceedings of the NationalAcademy of Science of the USA 42, 43–47.

Sabbatini, M. and Linton, O. (1998) A GARCH model of the implied volatility of the Swiss market indexfrom option prices. International Journal of Forecasting 14, 199–213.

Schwert, G.W. (1989) Why does stock market volatility change over time? Journal of Finance 14, 1115–1153.

Sentana, E. (1995) Quadratic ARCH models. Review of Economic Studies 62, 639–661.

Sharpe, W. (1964) Capital asset prices: a theory of market equilibrium under conditions of risk. Journal ofFinance 19, 425–442.

Page 501: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

REFERENCES 485

Shephard, N. (1996) Statistical aspects of ARCH and stochastic volatility. In D.R. Cox, D.V. Hinkley and O.E.Barndorff-Nielsen (eds), Time Series Models in Econometrics, Finance and Other Fields , 1–67. London:Chapman & Hall.

Shephard, N. (2005) Introduction. In N. Shephard (ed.), Stochastic Volatility , pp. 1–33. Oxford: Oxford Uni-versity Press.

Silvapulle, M.J. and Silvapulle, P. (1995) A score test against one-sided alternatives. Journal of the AmericanStatistical Association 90, 342–349.

Silvennoinen, A. and Terasvirta, T. (2008) Multivariate GARCH models. In T.G. Andersen, R.A. Davis, J-P.Kreiss and T. Mikosch (eds), Handbook of Financial Time Series . New York: Springer.

Starica, C. (2006) Is GARCH(1, 1) as good a model as the accolades of the Nobel Prize would imply? Workingpaper, Chalmers University of Technology.

Stentoft, L. (2005) Pricing American options when the underlying asset follows GARCH processes. Journalof Empirical Finance 12, 576–611.

Stentoft, L. (2008) Option pricing using realized volatility. CREATES Research Paper 2008-13, Aarhus Busi-ness School.

Straumann, D. (2005) Estimation in Conditionally Heteroscedastic Time Series Models. Lecture Notes in Statis-tics 181. Berlin: Springer.

Straumann, D. and Mikosch, T. (2006) Quasi-maximum-likelihood estimation in heteroscedastic time series:a stochastic recurrence equation approach. Annals of Statistics 34, 2449–2495.

Stroock, D. and Varadhan, S. (1979) Multidimensional Diffusion Processes. Berlin: Springer-Verlag.

Taylor, J.W. (2004) Volatility forecasting with smooth transition exponential smoothing. International Journalof Forecasting 20, 273–286.

Taylor, S.J. (1986) Modelling Financial Time Series . Chichester: John Wiley & Sons, Ltd.

Taylor, S.J. (2007) Asset Price Dynamics, Volatility and Prediction. Princeton, NJ: Princeton University Press.

Tjøstheim, D. (1990) Non-linear time series and Markov chains. Advances in Applied Probability 22, 587–611.

Tol, R.S.J. (1996) Autoregressive conditional heteroscedasticity in daily temperature measurements. Environ-metrics 7, 67–75.

Tong, H. (1978) On a threshold model. In C.H. Chen (ed.), Pattern Recognition and Signal Processing , pp.575–586. Amsterdam: Sijthoff and Noordhoff.

Tong, H. and Lim, K.S. (1980) Threshold autoregression, limit cycles and cyclical data. Journal of the RoyalStatistical Society B 42, 245–292.

Truquet, L. (2008) Penalized quasi maximum likelihood estimator. Working document.

Tsai, H. and Chan, K.-S. (2008) A note on inequality constraints in the GARCH model. Econometric Theory24, 823–828.

Tsay, R.S. (2002) Analysis of Financial Time Series . New York: John Wiley & Sons, Inc.

Tse, Y.K. (2002) Residual-based diagnostics for conditional heteroscedasticity models. Econometrics Journal5, 358–373.

Tse, Y.K. and Tsui, A. (2002) A multivariate GARCH model with time-varying correlations. Journal of Businessand Economic Statistics 20, 351–362.

Tweedie R.L. (1998) Markov chains: structure and applications. Technical Report, Department of Statistics,Colorado State University.

van der Vaart, A.W. (1998) Asymptotic Statistics . Cambridge: Cambridge University Press.

van der Weide, R. (2002) GO-GARCH: a multivariate generalized orthogonal GARCH model. Journal ofApplied Econometrics 17, 549–564.

Visser, M.P. (2009) Volatility proxies and GARCH models . PhD thesis, Korteweg-de Vries Institute for Math-ematics, Amsterdam.

Vrontos, I.D., Dellaportas, P. and Politis, D.N. (2003) A full-factor multivariate GARCH model. EconometricsJournal 6, 312–334.

Wang, S. and Dhaene, J. (1998) Comonotonicity, correlation order and premium principles. Insurance: Math-ematics and Economics 22, 235–242.

Wang, S., Young, V. and Panjer, H. (1997) Axiomatic characterization of insurance prices. Insurance: Math-ematics and Economics 21, 173–183.

Page 502: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

486 REFERENCES

Weiss, A.A. (1984) ARMA models with ARCH errors. Journal of Time Series Analysis 5, 129–143.

Weiss, A.A. (1986) Asymptotic theory for ARCH models: estimation and testing. Econometric Theory 2,107–131.

Wolak, F.A. (1989) Local and global testing of linear and nonlinear inequality constraints in nonlinear econo-metric models. Econometric Theory 5, 1–35.

Wold, H. (1938) A Study in the Analysis of Stationary Time Series . Uppsala: Almqvist & Wiksell.

Wong, E. (1964) The construction of a class of stationary Markov processes. In R. Bellman (ed.), StochasticProcesses in Mathematical Physics and Engineering , Proceedings of Symposia in Applied Mathematics, 16,pp. 264–276. Providence, RI: American Mathematical Society.

Wright, S.P. (1992) Adjusted p-values for simultaneous inference. Biometrics 48, 1005–1013.

Wu, G. (2001) The determinants of asymmetric volatility. Review of Financial Studies 837–859.

Xekalaki, E. and Degiannakis, S. (2009) ARCH Models for Financial Applications . Chichester: John Wiley &Sons, Ltd.

Zakoıan, J.-M. (1994) Threshold heteroskedastic models. Journal of Economic Dynamics and Control 18,931–955.

Zarantonello, E.H. (1971) Projections on convex sets in Hilbert spaces and spectral theory. In E.H. Zarantonello(ed.), Contributions to Nonlinear Functional Analysis , pp. 237–424. New York: Academic Press.

Page 503: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

Index

Adaptive GARCH estimator, 223Aggregation

contemporaneous, 84stability by, 276temporal, 79–81

APARCH (asymmetric power ARCH), 256ARCH effect test, 115, 196, 199ARCH(∞), 39–44ARIMA, 5ARMA, 4, 5, 12, 13

autocorrelation, 5for squares of GARCH, 20identification, 100multivariate, 274

identifiability, 292Asymmetries, see Leverage effect, 245Autocorrelation

empirical, 2, 3, 101, 108–110partial, 106, 108, 353theoretical, 2

Autocovarianceempirical, 2, 3for squares of GARCH, 50–53multivariate, 273theoretical, 2

Autoregressive moving average model, seeARMA, 4

Bahadur’s approach, 201, 217, 217Bartlett’s formula, 3, 13, 368

generalized, 14, 101, 103, 104, 124,359, 359

BEKK GARCH, 281Black–Scholes formula, 320, 323

GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakoıan 2010 John Wiley & Sons, Ltd

Causal, see Nonanticipative solution, 44CCC (constant conditional correlations)

GARCH, 280estimation, 295stationarity, 290

CLT (central limit theorem)for α-mixing processes, 352for martingale differences, 111for martingale increments, 130, 134,

167, 171, 178,345, 347, 402

for stationary martingale differences, 95Coefficient of determination, 124Corner method, 106

DCC GARCH, 281Diagonal GARCH, 276Diffusions, 311–319

EGARCH (exponential GARCH), 246moments, 249stationarity, 247

Ergodicity of stationary processes, 13, 24,343

Factor GARCH, 284, 309BEKK representation, 309

FF (full-factor) GARCH, 285FGLS, 132–134

GARCHsemi-strong, 21, 80strong, 80weak, 82

Page 504: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

488 INDEX

GARCH(p, q)definitions, 19identification, 108kurtosis, 48moments, 45–46prediction of the squares, 53–57second-order stationarity, 27, 37strict stationarity, 24, 29–36, 30vector representation, 29

GARCH-M, 315Geometric ergodicity, see Markov chain, 68GJR-GARCH, 250

HAC (heteroscedasticity and autocorrelationconsistent), 105, 114, 122

Heteroscedasticity, 6conditional, 10

IGARCH(p, q), 28, 38Irreducibility, see Markov chain, 68Ito’s lemma, 314

Kurtosis, 48, 255

Lagrange multiplier test, 111–116, 124optimal, 228

LAN (local asymptotic normality), 226Least absolute deviations, 238

weighted, 238Leptokurticity, 9Leverage effect, 10, 246, 263, 271Likelihood ratio test, 194–204

optimal, 229Linearly regular, 4Long-run variance, 105, 114, 122Lyapunov exponent, 30, 30, 33, 36, 58, 73,

144, 253, 290, 295

Markov chain, 63–68, 313Martingale difference, 344Martingale increments, see Martingale

difference, 344Mixing coefficients, 348

ARCH(q), 74ARCH(1), 69GARCH(1,1), 71

ML for GARCH, 219–231asymptotic behavior, 221

comparison with the QML, 221missspecification, 219, 231one step estimator, 222

News impact curve, 251Nonanticipative solution, 24

O-GARCH (orthogonal GARCH), 285OLS, 127–131

asymptotic properties, 129–131constrained OLS, 135–137for GARCH(p, q), 181, 405

Options, 319

PC-GARCH, see O-GARCH, 285PCA (principal components analysis), 285,

309Persistence of shocks, 24Pitman’s approach, 202, 217, 217Portmanteau test, 97–100, 117Purely nondeterministic, 4

QGARCH (quadratic GARCH), 258QML for ARCH

nonstationarity, 168QML for ARMA-GARCH, 150, 151

asymptotic normality, 153, 154consistency, 152, 172

QML for GARCH, 141–143asymptotic law at the boundary, 190,

194, 207asymptotic normality, 159–168consistency, 143–145, 156–159non-Gaussian, 231, 232optimality condition, 221

Quasi-likelihood, 141

Random walk, 28RiskMetrics, 60, 327

Score test, see Lagrange multiplier test, 111,194–204

Self-weighted QMLE, 237Semi-parametric GARCH model, 223Stationarity

second order, 2multivariate, 273

strict, 1

Page 505: GARCH Modelshome.iitj.ac.in/~parmod/document/GARCH Forecasting Model.pdf · Notation xiii 1 Classical Time Series Models and Financial Series 1 1.1 Stationary Processes 1 1.2 ARMA

INDEX 489

Stochastic discount factor, 322–326Stochastic volatility model, 11, 84,

89, 321Student’s t test, 195, 197

TGARCH (threshold GARCH), 250stationarity, 251–255

VaR (value at risk), 327–331estimation, 334–337

Vector GARCH, 277stationarity, 287

Volatility, 9, 11, 331historical, 321

implied, 321realized, 337

Volatility clustering, 9, 21

Wald test, 194–204optimal, 227, 231

Weighted least squares, 236White noise, 13

multivariate, 274strong 2, 13weak 2, 13

Whittle estimator, 238Wold’s representation, 4

multivariate, 274