studies in covariance estimation and applications in

STUDIES IN COVARIANCE ESTIMATIONAND APPLICATIONS IN FINANCE

A DISSERTATIONSUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL &

MATHEMATICAL ENGINEERINGAND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITYIN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

Carl-Fredrik ArndtMay 2016

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/gt735tt2921

© 2016 by Carl-Fredrik Arndt. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/gt735tt2921

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

George Papanicolaou, Primary Adviser


Iain Johnstone


Leonid Ryzhik

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

This thesis examines estimation of covariance and correlation matrices. More specif-ically we will in the first part study dynamical properties of the top eigenvalue andeigenvector for sample estimators of covariance and correlation matrices. This is doneunder the assumption that the top eigenvalue is separated from the others, which isreasonable when the data comes from financial returns. We show exactly how thesequantities behave when the true covariance or correlation is stationary and derive the-oretical values of related quantities that can be useful when quantifying the amountof non-stationarity for real data. We also validate the results by using Monte-Carlosimulations. A major contribution from the analysis is that it shows how and underwhich regimes correlation matrices di�er from covariance matrices from a dynamicviewpoint. This e�ect has been observed in financial data, but never explained.

In the second part of the thesis we study modifications to covariance estimatorsthat find the optimal estimator within a certain sub-class. This type of estima-tors is generally known as shrinkage estimators as they modify only eigenvalues ofthe original estimator. We will do this when the original estimator takes the formA1/2XBXT A1/2, where A and B are matrices and X is a matrix of i.i.d. variables.The analysis is done in the asymptotic limit where both the number of samples andvariables approach infinity jointly so that random-matrix theory can be used. Ourgoal is to find the shrinkage estimator which minimizes expected value of the Frobe-nius norm between the estimator and the true covariance matrix. To do this we firstderive a generalization to the Marchenko-Pastur equation for this class of estimators.This theorem allows us to calculate the asymptotic value of the projection of thesample eigenvectors onto the true covariance matrix. We then show how to use these

iv

to find the optimal covariance estimator. At last, we show with simulations that theseestimators are close to the optimal bound when used on finite data sets.

v

Acknowledgments

Looking back at the five years I have spent at Stanford for my PhD, I can without adoubt say that my time here have exceeded any expectations I had about it when Ifirst came here. Obviously no place is better than its people, so I would thus like tothank all the people who have my life here a pleasure.

To start with I like to thank my advisor Professor George Papanicolaou for hiscontinuous supports and thoughtful guidance of my research. He has constantly beengiving me high-level ideas, suggestions of directions towards which I can direct myresearch and feedback on it. More so he has given me the freedom to explore whateverI find interesting, which is something I have highly valued. This has led me to explorenumerous and very di�erent topics during my time here. He has also encouraged andallowed me to do internships during each summer, which has helped me to gainknowledge within finance that cannot be taught through classes or research. I wouldalso like to thank the other members of my committee; Professor Lenya Ryzhik,Professor Iain Johnstone, Professor Markus Pelger, and Professor Kay Giesecke fortaking their time to serve on my committee. I am also grateful for Teknik Dr MarcusWallenberg fund for supporting me during my last two years at Stanford.

I am very thankful for all the friends and colleagues I have met during my timeat Stanford. The full list of all people would be very long, but in particular I wouldlike to thank Drew Schmitt, Nick McCoy, Michael Maas, Lewis Li, Mike Jones andSimon Ejdemyr for all the fun times together. I am also thankful to the StanfordScandinavian society and all the people I have met through that. I am especiallygrateful to my wonderful girlfriend Lisann Muhlstein, whose love and support hasbeen invaluable. Even at the worst times she still has the power to make me happy

vi

again. Lastly and most importantly, I am very grateful for all the love and supportmy family has given me. My mom and dad have always been there for me wheneverI have needed them and always believed in my ideas and plans no matter how far-fetched they have been. The same goes for my grandparents Doris and Bertil whoare dearly missed. Without the support from my family, I would not have been heretoday.

vii

Contents

Abstract iv

Acknowledgments vi

1 Introduction 11.1 Problems in modeling and estimation of high-dimensional data . . . 3

1.1.1 Non-stationarity and financial data . . . . . . . . . . . . . . . 41.1.2 Non-linear estimators for data covariances . . . . . . . . . . . 7

1.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.1 Covariance and correlation matrices of financial data . . . . . 101.2.2 Covariance estimation . . . . . . . . . . . . . . . . . . . . . . 14

1.3 Further background on problems . . . . . . . . . . . . . . . . . . . . 181.3.1 Covariance and correlation matrices of financial data . . . . . 181.3.2 Covariance estimation . . . . . . . . . . . . . . . . . . . . . . 21

1.4 Summary of contributions, discussion & further directions . . . . . . 231.4.1 Covariance and correlation matrices of financial data . . . . . 231.4.2 Covariance estimation . . . . . . . . . . . . . . . . . . . . . . 24

1.5 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Dynamics of the top eigenvalue and eigenvector of empiricalcorrelations of financial data 272.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Return structure & assumptions . . . . . . . . . . . . . . . . . . . . 292.3 Eigenvalue dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

viii

2.3.1 Covariance matrices . . . . . . . . . . . . . . . . . . . . . . . 322.3.2 Correlation matrices . . . . . . . . . . . . . . . . . . . . . . . 34

2.4 Eigenvector dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.1 Covariance matrices . . . . . . . . . . . . . . . . . . . . . . . 392.4.2 Correlation matrices . . . . . . . . . . . . . . . . . . . . . . . 42

2.5 Extension to moving window estimators . . . . . . . . . . . . . . . . 442.5.1 Covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . 452.5.2 Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6 Simulation verification . . . . . . . . . . . . . . . . . . . . . . . . . . 482.6.1 Eigenvalue results . . . . . . . . . . . . . . . . . . . . . . . . . 482.6.2 Eigenvector results . . . . . . . . . . . . . . . . . . . . . . . . 50

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.8.1 Variogram for Ornstein Uhlenbeck process . . . . . . . . . . . 522.8.2 Variance of correlation di�erential . . . . . . . . . . . . . . . . 542.8.3 Perturbation derivation of Covariance eigenvector dynamics . 582.8.4 Eigenvector overlap for covariance matrices . . . . . . . . . . . 622.8.5 Perturbation derivation of Correlation eigenvector dynamics . 632.8.6 Eigenvector overlap for correlation matrices . . . . . . . . . . 682.8.7 Stationary distribution for xt . . . . . . . . . . . . . . . . . . 712.8.8 Variogram for xt . . . . . . . . . . . . . . . . . . . . . . . . . 722.8.9 Variogram for ⁄t

1

in the case of a flat estimator . . . . . . . . 74

3 Optimal non-linear shrinkage for a large class of modelsand estimators 763.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.2 On general large dimensional matrices . . . . . . . . . . . . . . . . . 79

3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.2.2 The Marchenko-Pastur theorem and generalizations . . . . . . 83

3.3 Asymptotically optimal bias correction for the eigenvectors . . . . . 873.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

ix

3.4.1 Robust estimators for elliptical data . . . . . . . . . . . . . . . 913.4.2 Spatio-temporal data . . . . . . . . . . . . . . . . . . . . . . . 94

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.6.1 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . 973.6.2 Proof of Theorem 3.3.1 & 3.3.2 . . . . . . . . . . . . . . . . . 98

Bibliography 107

x

List of Tables

xi

List of Figures

2.1 Plot of the variance of ⁄1

against the value of ⁄1

for a one factor modelwith constant variance factor and N = 100. We can see that when ⁄

1

>

N/2 the value of the variance is decreasing. This can be contrastedwith the case of covariance eigenvalue where the corresponding curveis a quadratic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2 The plot is depicting the fraction of the variance for the top eigenvalueof a correlation matrix and the corresponding covariance approxima-tion. We can see here that for small values the variances are the samebut as ⁄

1

grow the values are strongly diverging. . . . . . . . . . . . 382.3 Plot of the asymptotic distribution pŒ(x) using the formula supplied

in [1] (solid line) and using formula (2.21) (dotted line). We have chosenthe parameters to be N = 200, ‘ = 0.02 and ccov = 0.02 which impliesthat µ = 0.02. The vertical lines correspond to the calculated meansof the distributions. The relative error of the mean for the formulasupplied by [1] is ¥ 10% which can be compared to the error of (2.21)which is ¥ 0.01%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

xii

2.4 Plot of the asymptotic distribution pŒ(x) for a covariance matrix (solidline) and for the corresponding correlation matrix (2.21) (dotted line).We have chosen the parameters to be N = 50, ‘ = 0.02 and ccov = 0.02which implies that µ = 0.02. To get the correlation values we randomlygenerated a one-factor model with the mentioned parameters whichhave ⁄

1

¥ 23.5,bcorr ¥ 0.48 and ccorr ¥ 0.045. We then simulated thisand calculated the –i’s and “i’s from the simulated data. The verticallines correspond to the calculated means of the distributions. . . . . 44

2.5 Variogram for the correlation eigenvalue (left) and covariance eigen-value (right) in the case when K = 5 and ‘ = 0.04 when using anEWMA estimator. The solid line represent the simulated average valueand the dotted line is the theoretical value. . . . . . . . . . . . . . . . 49

2.6 Stationary distribution for the correlation eigenvalue (left) and covari-ance eigenvalue (right) in the case when K = 5 and T = 1/‘ = 20when using an EWMA estimator. The solid line represents the simu-lated average value and the dotted line is the theoretical value. . . . . 49

2.7 Variogram for the correlation eigenvalue (left) and covariance eigen-value (right) in the case when K = 5 and ‘ = 0.04 when using amoving window estimator. The solid line represents the simulated av-erage value and the dotted line is the theoretical value. . . . . . . . . 50

2.8 Variogram for the correlation eigenvector (left) and covariance eigen-vector (right) in the case when K = 1 and ‘ = 0.0001 when usingan EWMA estimator. The solid line represent the simulated averagevalue and the dotted line is the theoretical value. . . . . . . . . . . . 51

3.1 Plot of the PRIAL of the shrinkage estimator retrieved by using (3.15)for both the sample estimator and Maronna’s estimator. Each pointon the curves is generated by using 200 simulations from the samemodel. Each of model is generated by setting � equal to a matrix withstandard Gaussian entries. . . . . . . . . . . . . . . . . . . . . . . . 94

xiii

3.2 Plot of the PRIAL of the shrinkage estimators retrieved by using (3.17)for both the sample estimator (Left) and Maronna’s estimator (Right).We also compare this to the matrix obtained by taking the inverseof the shrunken covariance matrix for the corresponding base matrix.Each point on the curves is generated by using 200 simulations fromthe same model. Each model is generated by setting � equal to amatrix with standard Gaussian entries. . . . . . . . . . . . . . . . . . 95

3.3 Plot of the PRIAL of the shrinkage estimators for An(left) and Bn(right)when using the sample estimator as the base estimator. Each point onthe curves is generated by using 200 simulations from the same model.The population A and B are generated in the same way as � in theprevious section. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

xiv

Chapter 1

Introduction

In this thesis we will examine estimation of covariance and correlation matrices. Thefirst part of the thesis will focus on a situation with assumptions that are reason-able for financial data. Our interest will lie in studying the dynamical properties ofthe top eigenvalue and eigenvector for di�erent sample estimators of covariance andcorrelation matrices. The major assumption here is that the top eigenvalue is wellseparated from the others. In general this is true in systems where the majority of thevariables have a strong positive correlation, as these will have an eigenvector whereall elements are positive and a corresponding eigenvalue approximately equal to theaverage correlation of the variables times the population size. For financial data itis normally found that this top eigenvector corresponds to a portfolio which has thesame characteristics as the market index. Hence the corresponding eigenvalue deter-mines how much of the overall movement comes from this collective market portfolio.As such this pair contains a lot of useful information and will turn out to be a goodproxy as to quantify the amount of non-stationarity that exist in the market. Herewe will derive the dynamics of these quantities for estimated covariance or correlationmatrices, in the circumstance that the true matrix is stationary. Hence this approachserves as a benchmark when one tries to quantify the amount of non-stationaritythat exist in a real data set. All our analysis is done using perturbation theory anda continuous time approximation. After doing so we also validate the results withMonte-Carlo simulations. An interesting feature from the analysis is that it proves

1

CHAPTER 1. INTRODUCTION 2

and gives good insights into why correlation matrices di�er from covariance matricesfrom a dynamic viewpoint. It has previously been observed that correlation matricesexhibit more stability than covariance matrices, which has been explained in variousways. However as we show, this e�ect is to be expected in this regime due to theirdi�erences in definition.

In the second part of the thesis we change the focus to look at how estimation canbe done optimally. More specifically we will assume that we are given some estimatorof the data, which has the form A1/2XBXT A1/2, where A and B are matrices and X

is a matrix of i.i.d. variables. We will look at modifications to this estimator, whichhave the constraint that they have the same eigenvectors as the original estimator.This means that our task will be to find a way to choose the eigenvalues. We willdo that so that the estimator minimizes expected distance to the true covariancematrix under the Frobenius norm. As one can easily find the solution in termsof the true covariance matrix, we seek to reduce this quantity to only depend onthe spectrum of the true covariance matrix. The path to this starts by lookingin the asymptotic limit where both the number of samples and variables approachinfinity jointly so that random-matrix theory can be used. Here this quantity canbe expressed in terms of functionals of the spectrum. The path is by first derivinga generalization to the Marchenko-Pastur equation for the class of estimators weare studying. This theorem allows us to calculate asymptotic values of projectionsbetween the sample eigenvectors and the true covariance matrix, which then can beused to find the optimal covariance estimator. As mentioned, all the analysis is donein the thermodynamical limit, and by definition this means that it is only valid whenthe number of observations goes to infinity. Hence it is of great interest to see howwell this method work in actual situations when the amount of data is finite. To dothis we illustrate that these estimators are close to the optimal bound, by using twodi�erent applied problems with simulated data. These examples also illustrate thetwo distinct types of situations in which this problem can come up; i.e. both whenthe regular sample estimator takes the given form and when we already have someother estimator(e.g. a robust estimator) that we are trying to improve.


1.1 Problems in modeling and estimation of high-dimensional data

Covariance and correlation matrices are central in many applications, such as finance,signal processing, climatology and biostatistics. For example in finance they are usedto describe the relationships between financial assets which can then be used to findgood allocations among these [42], manage risk [41] or price derivatives [16]. Theyare also strongly connected to each other as one can deduce the correlation matrixfrom the covariance matrix by removing its variance influence. In some aspects theybehave similarly, however all properties that holds for covariance matrices do notdirectly transfer to correlation matrices.

If we let Y œ Rn◊N be a matrix containing N realizations of some n-dimensionalrandom variable with mean 0, then the sample covariance estimator is defined as

S = 1N

Y Y T = 1N

Nÿ

i=1

Y•,iYT

•,i, (1.1)

where Y•,i is the ith column. In a similar way, the sample correlation estimator isdefined as

C = 1N

V 1/2Y Y T V 1/2 (1.2)

where V is the diagonal matrix with elements Vi,i = 1/Ò

Si,i.An alternative and equivalent way to view this, which simplifies some analysis

later, is that we have samples �1/2X œ Rn◊N , where � œ Rn◊n is the true covariancematrix and X œ Rn◊N is a matrix containing normalized and independent randomvariables. Under this formalism the sample estimator takes the form

S = 1N

�1/2XXT �1/2. (1.3)

Notice that even though our interest here lies purely in the estimation of the covari-ance matrix, this is rarely the end goal in any application. For example in portfolio


theory one might seek to find the optimal portfolio allocation vector which is a func-tion of the precision(inverse covariance) matrix. The standard approach is then tofind an estimator of the covariance, invert it and plug it into a given formula [29].However it is clear that this approach is sub-optimal as the end goal is disconnectedfrom the estimation procedure.

The sample covariance and correlation estimators can be shown to be asymptoti-cally e�cient under classical asymptotical statistical theory, i.e. when n is fixed andN æ Œ. However this does not mean that they are e�ective for all practical pur-poses. In particular, we will in this thesis study two particular situations where wewill see how it is limited. First we will look at one instance of how they di�er underdynamical conditions. Secondly we will study how the sample covariance matrix canbe optimally improved for a given estimation problem.

1.1.1 Non-stationarity and financial data

First we will assume that data may not be stationary, meaning that the parame-ters/distribution is possibly not the same for all samples. More explicitly we willassume that our data comes as a time series and that the covariance may not be con-stant in time. This would mean that each term in (1.1) has a di�erent mean and thusthat the estimator is not consistent. To overcome this we could for example discardsamples whose mean we expect to di�er too much from the true covariance. Thisapproach is probably the one most commonly taken in finance, where it is a fact thatthe covariance is not stationary [9]. So for example when dealing with daily data,one normally only uses observations from the last 1-2 years and throws away anyremaining data. Another way to think of this approach is that each sample receives aweight wi, which in this case is either 1/Tw, with Tw being the number of days withinthe window, if the sample is within the window and 0 otherwise. Our estimator thustakes the form

Sw =Nÿ

i=1

wiX•,iXT•,i. (1.4)


Note here that each sample within the window is considered as equally important as itis given the same weight. A slightly di�erent approach is to assume that the sample’simportance decays exponentially with time. If we let Ÿ denote this decay rate andassume that we have access to an infinite amount of observations, the estimator canbe written as

Sexp,t = 1qŒ

·=0

e≠Ÿ·

Œÿ

·=0

e≠Ÿ· X•,t≠· XT•,t≠· . (1.5)

Here the first term comes from the constraint that the weights have to sum to 1 toremain unbiased. If we set ‘ = e≠Ÿ, then it follows that

1qŒ

·=0

e≠Ÿ·= 1

qŒ·=0

‘·= 1

1/(1 ≠ ‘) = 1 ≠ ‘,

so that the exponential weighted moving average estimator(EWMA) can be writtenas

Sexp,t = (1 ≠ ‘)Œÿ

·=0

‘· X•,t≠· XT•,t≠· . (1.6)

This estimator is commonly used as a benchmark for risk measure within the finicalindustry [43], where it is suggested to set ‘ = 0.06 when using daily data. However,which value to use depends on the data and so the task of choosing it is highly non-trivial. Di�erent types of weights have also been studied under di�erent assumptionsand goals.

The exponentially weighted estimator can compensate for the e�ect of non-stationarityin the data, however it does not determine whether the covariance is actually non-stationary or not. In order to achieve this task one could, for example, study theevolution of some given norm of this estimator. The main issue with this approachis that it can tell us that something is varying, but it does not tell us where thisvariation comes from and why. In finance this is particularly bad, as it is knownthat assets exhibit a high degree of heteroscedasticity [20]. For example in [5], itis shown that the volatility structure of a single asset is well fitted by a so-called


GARCH(p,q)-model, i.e. its volatility at time t, ‡t can be modeled as

‡2

t = Ê +qÿ

·=1

–· ‘2

t≠· +pÿ

·=1

—· ‡2

t≠· (1.7)

where p, q are positive integers and Ê, {–i}pi=1

, {—i}pi=1

are parameters that can befound empirically. After fitting this model one can observe that the amount of volatil-ity can change by orders of magnitude between di�erent time periods.

Hence it is already expected that the covariance matrix will have fluctuations,due to the changes in volatility. As these changes can be of orders of magnitude insize, it is expected that these will dominate the fluctuations of the covariance matrix.As there also already exist good methods to understand these fluctuations (like theGARCH-model), these are not the type of structural changes we are interested in.This issue is partly dealt with by looking at the correlation instead of the covariancematrix, as the correlation matrix focuses purely on the structure between the assets.To get a better understanding of the structure, one usually uses Principal ComponentAnalysis (PCA). The general idea of PCA is to compute the eigenvalue decompositionof the covariance matrix and then only use the first k eigenvectors as a proxy for thecovariance matrix(where k is some positive integer less than n), i.e. one sets thek + 1th, . . . , nth eigenvalues to 0. This can equally well be done for the correlationmatrix. This approach has had a lot of successes and is easy computationally. It hasled to an interest into the studying of the spectrum(eigenvalues) of estimators withdata from di�erent applications. Depending on the source of the data, the spectrummay have very di�erent traits.

For equity data it has especially been noted that one always has a top eigenvaluethat is orders larger than the remaining eigenvalues, for both the covariance andcorrelation matrix. The elements of the corresponding eigenvector are also positiveand, when scaled by the volatility, correspond to the relative market capitalization,i.e. the fraction of capital they have compared to the capital of all stocks. Asthis scalar vector pair hence contains a lot of the information of the covariance orcorrelation matrix, it is a good quantity to use to quantify the stationarity. Becauseof this, our goal is to study how these quantities evolve in a model where the true


covariance/correlation matrix is fixed. This can serve as a baseline model whencomparing to empirical data, as one then expects similar behavior but with morefluctuations due to non-stationarity.

1.1.2 Non-linear estimators for data covariances

The second limitation of the sample estimator (1.1) comes from the fact that wenever have an infinite amount of samples. More than that we are far from the regimewhere this would even be approximately correct. Normally you want the number ofsamples to be large in comparison to the number of parameters you are trying toestimate, e.g. if you want to estimate 1 parameter from N samples you want N to belarge. Here we are trying to estimate n(n≠1)

2

parameters from nN sample, meaningthat we e�ectively have 2N/(n ≠ 1) samples per variable. So for the classical typeof asymptotics to be valid, we need N/n to be infinite or very large. But if we lookat a typical example from finance, one may want to estimate the covariance betweenthe S&P 500 stocks (the largest 500 stocks on the US equity market) using datafrom last 2 years (520 days), as any data prior to that does not reflect the currentmarket situation. By previous discussion we thus get about 2 unique samples perparameter, which is far from infinite. This phenomenon is not unique to finance, butalso comes up in other fields like genomics, signal processing and physics. Hence thisregime where the population size and the number of samples are of the same order ofmagnitude has gained significant attention. Within covariance estimation, its rootscan be traced back to Marchenko-Pastur [39]. If we let ⁄

1

, . . . , ⁄n be the eigenvaluesof S, then let the empirical spectral distribution of S be Hn defined as

Hn(⁄) = 1n

nÿ

i=1

1(⁄i,Œ]

(⁄). (1.8)

Marchenko-Pastur showed that when N/n converges to some fixed number, the resol-vent of the estimator, m

ˆS(z) = TrË(S ≠ zI)≠1

È, converges to the Stieltjes transform


of limiting density H. This in turn satisfies the integral equation

mH(z) =⁄

R

1· [1 ≠ c ≠ czm

ˆS(z)] ≠ zdH(·), (1.9)

where c = limnæŒ n/N . It is hard to deal with this equation for a general measureH, but in the case where the true covariance has all eigenvalues equal to 1, it can besimplified to show that the density for the estimated eigenvalues equals

p(x) = 12fi

Ô⁄

+

≠ xÔ

x ≠ ⁄≠

cx, x œ [⁄≠, ⁄

+

], ⁄± =11 ±

Ôc2

2

. (1.10)

This is normally referred to as the Marchenko-Pastur distribution. It is clear from thisthat we have a strong deviation from the true spectrum. Though this can partly tellus how far from its true value the sample covariance is, it does not directly give anyinsights on how to correct it. Methods like PCA essentially try to correct this e�ectby only using information from eigenvalues which deviate from the others, i.e. beingfar from the upper end of the Marchenko-Pastur spectrum. This led into researchon how to use this additional information to choose the number of components, bothempirically in the econophysics literature [30] and also theoretically in the statisticsliterature through so-called spiked-population models [26]. The thought behind this isthat informative eigenvalues will hopefully separate from the ’bulk’(all uninformativeeigenvalues), and as we do not have any prior information on the eigenvalues in thebulk we will let them all be equal. Regardless of its simplicity and approximatemanner, this approach has still been shown to be e�ective.

We will here look at a more extensive version of what is described above, namelyassume that we are given an estimator S with the eigen-decomposition

S = UDUT , D = Diag(⁄1

, . . . , ⁄n), (1.11)

where U is the matrix containing the eigenvectors of S and ⁄1

, . . . , ⁄n are the eigen-values. From this we will try to find the optimal estimator S, under the criteria that


it has the same eigenvectors as S, i.e.

S = UDUT , D = Diag(⁄1

, . . . , ⁄n). (1.12)

The optimality condition will be to minimize the risk function that corresponds tohaving the Frobenius norm of the di�erence a loss function. This means that the lossfunction is

L(�, S) = Î� ≠ SÎF , where ÎXÎF =Ô

Tr XXT . (1.13)

However as we constrain S to be of the form (1.12) it follows from the fact that Î · ÎF

is a norm that

L(�, S) = ÎUT �U ≠ DÎF =ııÙ

nÿ

i=1

ËUT

•,i�U•,i ≠ di

È2

. (1.14)

Thus because of the decoupling between the di�erent eigenvalues it is clear that theoptimal choice is

di = UT•,i�U•,i, ’i = 1, . . . , n. (1.15)

Hence the optimal shrinkage function lets the shrinkage value be the projection of thecorresponding sample eigenvector onto the true covariance matrix. One might at firstfind this a bit counter intuitive as one may think the optimal estimator should simplycontain the true eigenvalues, however this is only true when the sample and populationeigenvectors align. It is found that if one knows the true covariance matrix, then onecan calculate this estimator exactly. This type of estimators, i.e. when the estimatoris dependent on an unobservable parameter, are known as Oracle estimators. Theorythen tell us what the best possible estimator of this type can achieve. However whenthe unobserved parameters are replaced by some estimators, optimality is no longerguaranteed. Our goal will thus be to study these projections in the thermodynamiclimit.

Note that a requirement for the Marchenko-Pastur equation to hold was that the


covariance estimator could be written as (1.3). However this is far from being truein general. There are several reasons for why this may be false, for example wemay use some other estimator than the regular sample estimator or there might besome additional underlying structure in the data. An example of the first scenariois if we were to use a weighted estimator, as in Section 1.1.1. These can be writtenS = �1/2XWXT �1/2, where W is the diagonal matrix containing the chosen weightsfor each sample. An example of the second scenario is if the data X is generatedfrom a linear time series. It can then be shown [37], under some constraints on thecoe�cients, that the regular sample estimator can be written as S = �1/2XBXT �1/2,where � corresponds to the covariance at each time step and B represent the autoco-variance. Another example of this is if the data comes from an elliptical distribution,as it then can be shown that the sample estimator again takes the same form [19]. Be-cause of this we have accordingly chosen to look at models where the sample estimatorto be shrunken takes the form

S = 1N

Y Y T = 1N

A1/2XB(A1/2XT ) (1.16)

which thus in contrast to (1.3) has the extra matrix B. This matrix can both bedeterministic (as in the two examples mentioned above) or stochastic as in e.g. thesample estimator for an elliptical distribution.

1.2 Main results

1.2.1 Covariance and correlation matrices of financial data

Going back to Section 1.1.1, we will in this thesis study the dynamics of the topeigenvalue and eigenvector for covariance and correlation matrices. We will do thisusing mainly the EWMA estimator (1.6), but also other estimators. The main ideahere is to determine what type of fluctuations to expect of an estimator if it is truethat the covariance matrix is static. Obviously even though the underlying covari-ance/correlation is fixed, the estimators are still going to fluctuate. As for the case


when the actual covariance is non-static, one expects that the fluctuation of the esti-mators in this case are always going be larger than when they are static(unless thereis an extreme behavior through correlations between the observed value and futurechange in covariance). Thus the model here, where the covariance is static, can bethought of as a null model and deviations from this can be interpreted as changes ofthe covariance/correlation structure.

We will be denoting the top eigenvalue estimated at time t by ⁄t1

for both covari-ance and correlation matrices. Similarly „t

1

will denote the corresponding eigenvector.For this it will be more informative to study its angle with the true eigenvector „

1

,defined though cos(◊t) = („t

1

)T „1

. Out assumption that the top eigenvalue is muchlarger than the others also makes this angle close to 0, thus it is more informative tolook at xt = 1 ≠ cos(◊t) which thus is expected to be close to 0.

For the analysis we will assume that the covariance matrix of the returns has atop eigenvalue which is much bigger than the remaining ones. We do not quantifythe exact amount necessary, as most results are still asymptotic and approximativeas perturbation theory is used. We will also assume that there is a total of n returnsand that these have a normal distribution with 0 mean. The assumption that themean is 0 could easily be relaxed as it will not change the results. As the EWMAestimator is the easiest to deal with in this setting, the majority of the results willbe done by using this. We will then be using the assumption that ‘n converges toa constant and that ‘ æ 0. As 1/‘ corresponds to the e�ective number of samplesused, the first condition can be thought of as the one used in random matrix theory,namely that the ratio of the population size and number of samples is kept fixed whilethe number of samples goes to infinity. We will also assume that the time betweensamples, �t, goes to 0.

Under the limits in the above assumptions, we show that the top eigenvalue of thecovariance/correlation matrix, obtained from the EWMA estimator of the covariancematrix, satisfies an SDE of the form

d⁄t1

= ‘Ë(cos2(◊t)⁄1

≠ ⁄t1

)dt + ‡ cos2(◊t)dBt

È. (1.17)


where ‘ is the parameter from the EWMA estimator and Bt is a Brownian motion.Here

‡2 =Y]

[2⁄2

1

For covariance matrices2(1 + –

1

)⁄2

1

≠ 4“1

⁄1

For correlation matrices

where –1

and “1

is given by (2.16). As these quantities are quite complex we will notdiscuss their exact form here. However a way to get some intuition for their behavioris to look at them under a one-spiked model of the correlation matrix [38]. It thenfollows that –

1

¥ fl2 and “1

¥ ⁄1

fl where fl = 1

n(n≠1)

qni=1

qj ”=n Ci,j is the average

correlation among the securities. Thus we can see that ‡2 ¥ 2(1 ≠ fl)2(⁄1

)2 whichcan be contrasted to the covariance case where the volatility was 2(⁄

1

)2. This is thussaying that the larger the average correlation is, the less the top eigenvalue is goingto fluctuate. This behavior is intuitive, which can be seen by examining the extremecase where all assets in the market correlate perfectly and have a non-zero variance. Itimplies that at any point in time we can estimate the top eigenvalue of the correlationmatrix perfectly, as it is equal to the number of assets, whereas the top eigenvalue ofthe covariance matrix will always be highly fluctuating. For small values of ⁄

1

thebehavior of the correlation eigenvalue is very similar to the covariance case, meaningthat the less separated the top eigenvalue is from the remaining ones, the more thecorrelation matrix behaves like a covariance matrix.

We have plotted the exact value for these two variance functions (i.e. not using theapproximative values of “

1

and –1

) in Figure 2.2. We can see from there that the aboveapproximation is in good agreement with direct numerical simulations. By again usingthe one-spiked correlation approximation, we further see that ‡2 ¥ 2nfl2(1 ≠ fl)2. Itis easy to see that this function is maximal when fl = 1/2, and this agrees well withwhat we can see from Figure 2.2.

As for the process xt representing the top eigenvector, we have shown in Sections2.4.1 and 2.4.2 that, both in the covariance and correlation cases, it satisfies an SDEof form

dxt = 2‘ (µ ≠ xt) dt + ‘bÒ

2xt (4xt + c)dBt. (1.18)


Here the constants µ, b and c di�er for the covariance and the correlation cases. Notehere that the process xt corresponds to 1 minus the cosine of an angle, and hence ithas to be supported on [0, 2]. One way to make sure it stays positive(following [22])is to look at the stationary distribution and make sure that this is well defined at 0.We show that the stationary distribution is

pŒ(x) Ã 12x

3x

4x + c

4 2µ

‘cb2

3 14x + c

4 1

2‘b2

+1

. (1.19)

From this we can see that the corresponding condition is that 2µ > ‘b2c, which wewill refer to as the Feller condition.

In the covariance case the coe�cients are given by bcov = 0 and µcov = ‘q

i”=1

⁄i

4⁄1

and ccov =q

i”=1

⁄2

i(⁄

1

≠⁄i)

⁄1

qi”=1

⁄i(⁄

1

≠⁄i)

. In the correlation case they are given by

µcorr = ‘

4

S

WUÿ

i”=1

⁄i

⁄1

+ÿ

i”=1

11 + ⁄i

⁄1

22

2 –i ≠ 2 1⁄

1

ÿ

i”=1

“i

A

1 + ⁄i

⁄1

BT

XV

bcorr =Û

1 + –1

≠ 2“1

⁄1

ccorr =q

i”=1

Ë⁄

1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

È2

(⁄1

≠⁄i)

⁄2

1

b2

corr

qi”=1

⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

(⁄1

≠⁄i)

(1.20)

Here –i and “i are defined in (2.25). Thus again the correlation case is substantiallymore complex. In this case we can use a one-spiked correlation model to gain someintuition. Under this it holds that bcorr ¥ 1 ≠ fl, so the higher the average correlationthe lower the fluctuations. Also µcorr ¥ n‘

4

⁄3

2

⁄1

and ccorr ¥ µcorr

b2

corr. Similarly for a

one-spiked covariance model it holds that µcov ¥ n‘4

⁄2

⁄1

and ccov ¥ ⁄2

⁄1

. It is thus clearthat in both the covariance and the correlation case the Feller-type condition is clearlyverified. As for the upper bound we have that µ is close to 0 and that the fluctuationsare of order ‘b and thus will be very small. Hence we can safely say that the processwill remain much smaller than 2.

To compare the coe�cients between the two cases, we start by noting that in a


one-spike correlation model it is true that ⁄2

¥ 1 ≠ fl. This means that the formulafor the correlation case will yield a much lower value than the corresponding one forcovariance matrices. Also note that as fl ∫ 0 and b gives the average fluctuation level,there will be a higher level of fluctuations for the covariance case. To emphasize, thecorrelation eigenvector can thus be better estimated both in terms of a smaller biasand fluctuations. We have also verified that this is true in the general case (i.e. withmore than one spike) numerically.

A potential issue with all the results is that they are based on perturbation theoryand are thus only approximative. To verify that results are correct in the limit wehave thus conducted Monte-Carlo simulations. In these we simulate the price processand then calculate the time series of the eigenvalues and eigenvectors. From this wecan then calculate any statistic that we are interested in. As some parameters forthe corresponding functions of these statistics are also expressed as averages of theeigenvalues and eigenvectors we also have to approximate them. Once this is done wecan insert them into the theoretical formulas that we have derived for the correspond-ing statistic. By doing this we have shown that in the limit, the empirical averagesconverge to the functional form of the averages. Hence the theory is asymptoticallycorrect, based on the simulations.

1.2.2 Covariance estimation

In the context of covariance estimation, this thesis develops a framework similar tothat of [31] for non-linear shrinkage estimators, but over the larger class of modelsdefined in Section 1.1.2. Recall that the underlying estimators in this class have theform

S = 1N

XXT = 1N

A1/2XB(A1/2XT ). (1.21)

We will also make the following assumption (see Assumption 3.2.1).

Assumption 1.2.1. Let the following hold:

• The variables n and N go to infinity simultaneously in the sense that the limit


c = limnæŒ n/N œ (0, Œ) exists.

• Xn œ Rn◊N consist of iid variables with mean 0, variance 1 which also satisfythe Lindeberg type condition

1nN”2

nÿ

i=1

Nÿ

j=1

EË|Xi,j|2 1

(”Ô

N,Œ)

(|Xi,j|)È

æ 0.

for each ” > 0 as n æ Œ.

• An is a n ◊ n random (or deterministic) positive definite Hermitian matrixindependent of Xn and Bn.

• Bn is a N ◊ N random (or deterministic) positive definite Hermitian matrixindependent of Xn and An.

• If we denote the eigenvalues of An by ·A,1, . . . , ·A,n with ·A,1 Ø ·A,2 Ø · · · Ø ·A,n,then HAn(·A) = 1

n

qni=1

1[·A,j ,Œ]

(·) converges a.s. to a non-random limit HA atevery continuity point of HA. HA defines a probability distribution functionwhose support is included in the compact interval [hA,1, hA,2] where 0 < hA,1 ÆhA,2 < Œ.

• If we denote the eigenvalues of Bn by ·B,1, . . . , ·B,N with ·B,1 Ø ·B,2 Ø · · · Ø·B,N„ then HBn(·B) = 1

n

qni=1

1[·B,j ,Œ]

(·) converges a.s. to a non-random limitHB at every continuity point of HB. HB defines a probability distribution func-tion whose support is included in the interval [hB

1

, Œ) where 0 < hB,1 < Œ.

We will also let Fn denote the spectral density of (1.21) henceforth and also referto its Stieltjes transform as mFn(z). With these assumptions we then continue toshow the following theorem. It can thus also be thought of as a generalization of theMarchenko-Pastur equation, as it can be reduced to this(see further Theorem 3.2.1).

Theorem 1.2.1. Under Assumption 1.2.1, we have ’z œ C+ that ◊gN(z) a.s.≠≠æ ◊g(z)

if g is a bounded and continuous function on [hA,1, hA,2] with finitely many points of


discontinuities and ◊g(z)

◊g(z) =⁄

R

g(a)a

sR

b1+bceF (z)

dHB(b) ≠ zdHA(a). (1.22)

This theorem will be the backbone to analyzing the optimal estimator to g(�). Ifwe start with the case of g(·) = · , we follow [31] and define the function

�n(x) = 1n

nÿ

i=1

UT•,iAnU•,i ◊ 1

[⁄i,Œ)

(x),

as using this we can retrieve the values of di as

di = lim‘æ0

�n(⁄i + ‘) ≠ �n(⁄i ≠ ‘)Fn(⁄i + ‘) ≠ Fn(⁄i ≠ ‘) , ’i = 1, . . . , n. (1.23)

By using Theorem 1.2.1 with g(·) = · and the fact that the derivative of the Stieltjestransform is proportional to the asymptotic spectral density, we get that the asymp-totic behavior of �n is

�n(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(1)

n (› + i÷)È

d›,

From this we then continue to show the theorem (see further Theorem 3.3.1).

Theorem 1.2.2. Assume that Assumption 1.2.1 holds. Then �n(x) a.s.≠≠æ �(x) forall x œ R \ 0. Further if c ”= 1 we have that �(x) =

s x≠Œ ”(⁄)dF (⁄), where

”(⁄) =

Y___]

___[

Im [eF (⁄)] /Im [mF (⁄)] If ⁄ > 0c

1≠ceF (0) If ⁄ = 0 and c > 1

0 Otherwise(1.24)

where

eF (0) := lim‘æ0

+

lim÷æ0

+

(› + i÷) ◊ eF (› + i÷).

The main consequence of this, which one gets by combing it with (1.23), is that


asymptotically it is true that

limnæŒ

di = limnæŒ

lim‘æ0

�n(⁄i + ‘) ≠ �n(⁄i ≠ ‘)Fn(⁄i + ‘) ≠ Fn(⁄i ≠ ‘) = ”(⁄i), ’i = 1, . . . , n. (1.25)

Hence ”(·) is the optimal non-linear shrinkage function.As mentioned in Section 1.1.2 it is also a common quest to estimate the precision

matrix directly, instead of just estimating the covariance matrix and inverting it. Thismeans that we are now interested in g(·) = ·≠1. We can then follow the same line ofpattern as above and define

�n(x) := 1n

nÿ

i=1

UT•,iA

≠1

n U•,i ◊ 1[⁄i,Œ)

(x),

and from this show the theorem (see Theorem 3.3.2).

Theorem 1.2.3. Assume that Assumption 1.2.1 holds. Then �n(x) a.s.≠≠æ �(x) for allx œ R \ 0. When c ”= 1 we have that �(x) =

s x≠Œ Â(⁄)dF (⁄), where

Â(⁄) =

Y_____]

_____[

1

⁄

Im

ËmF (⁄)

[

⁄mF (⁄)+1

]

eF (⁄)

È

Im[mF (⁄)]

If ⁄ > 0c≠1

eF (0)

+ cmHA (0)

c≠1

If ⁄ = 0 and c > 10 Otherwise

(1.26)

The conclusion of Theorem 1.2.3 is similar as that of 1.2.2 in that the optimalestimator of the precision matrix is di = Â(⁄i). This is seen by using the sameargument as those in Section 1.1.2 to get the optimal estimator. More exactly theoptimal estimator of the precision matrix, having the same eigenvectors as the originalcovariance estimator, is UT DU , with D being diagonal with entries Di,i = UT

•,iA≠1

n U•,i.Note that the above theorems only guarantees optimality asymptotically. How-

ever there is no form of guarantee for finite samples. We thus perform simulationexperiments to show that it is indeed the case that these estimators perform closeto the possible limit. This is measured by looking how far away the estimation riskis from the risk obtained by the estimator which uses the eigenvectors directly(e.g.di = UT

•,i�U•,i when estimating the covariance) normalized to the distance to the


regular sample estimator. Our results show that even in samples with as little as 20variables, the asymptotic estimator is close to the optimal one.

These simulations also motivate examples for how these estimators could be used.The first is an example where we estimate the covariance for elliptical data. Oneapproach to this is to simply use the sample estimator(which has the form (1.21))and shrink it. Another approach is to use a di�erent estimator and shrink that oneassuming it still has form (1.21). We illustrate this approach by using the so-calledMaronna’s robust estimator, which tries to remove the ellipticity from the data. Onemight at first think that this second approach is similar to the first, as we still shrinkthe eigenvalues after applying the robust estimator. This is however false as part ofthe gain in using a robust estimator is that it improves the estimation accuracy of theeigenvectors, which are exactly what defines the subclass we are optimally shrinkingunder. The shrinkage estimator then proceeds with these improved eigenvectors andfinds an optimal estimator in the class for which they are defined. We show that thisprocedure converges to the optimal estimator, for both the covariance matrix and theprecision matrix. We also consider a simulation study of data which has a spatio-temporal structure. One way to think about this is we have some true covariancematrix � = A¢B from which we have one sample. The separate estimators of A andB then have the form (1.21), so we can use our theory to estimate these. One way tothink about these in the case of linear time series, is that A represents local covariancestructure and B the auto-covariance structure. In our simulations we show that onecan estimate A and B simultaneously and that the risk of each one converges to theoptimal limit.

1.3 Further background on problems


Our main results on this topic can be seen as an extension of [50] and [1] to correlationmatrices and more general structures. While doing this we were also able to explainthe di�erences of covariance and correlations observed empirically in [63] and [9].


As for the eigenvalue, this has been considered empirically for both the covarianceand the correlation matrix for equity data in e.g. [9, 49, 52, 55, 60, 63]. The findinghere is that there are strong variations of this in both cases. As noted in [9], therealso exists an additional complicating factor in the correlation case as the trace of thecorrelation matrix is constrained to equal n. Hence an increase in the top eigenvaluemight be due to as a decrease in one of the others, rather than a change of itself(which they termed eigenvalue repulsion). They show that the top eigenvalue anti-correlates almost perfectly with the average value of the 20% smallest eigenvalue.In [63] they contrast the dynamics of the estimated eigenvalues between covarianceand correlation matrices with data from 3 di�erent equity exchanges. They find thatthe amount of variation in the largest correlation eigenvalues are much lower thanfor the corresponding covariance eigenvalues, and that this then agrees with ’thecommon lore that the volatilities capture the largest part of the dynamics’. They donot attempt to quantify this e�ect though.

On the theory side the dynamics of the top eigenvalue was first studied in [50]and this analysis was later updated in [1]. They derive a similar model to what wehave presented, but only for a one-spiked covariance matrix. However in their modelthey only study the covariance matrix, and as shown here these have clearly di�erentproperties than correlation matrices. The authors then estimate the parameters for4 di�erent data sets. To compare theory and data they use the variogram

E51

⁄t+·1

≠ ⁄t1

22

6= ‘2‡2 [1 ≠ e‘· ] , (1.27)

which can be computed both through the parameters and also by a time average forevery fixed · . By doing this they notice that there is a big discrepancy between themas the sample estimator is always substantially larger than the value obtained from(1.27). This is then interpreted as a correlation matrix that is not static. Their moti-vation for this is that since they pre-normalize the data, they work with correlations.However as shown here, the di�erence between the way covariance and correlationsbehave can be huge and thus this their conclusions may not be true.

As for the eigenvector, it has been studied less extensively than the eigenvalue.


This is probably due to the fact that vectors are more complex objects than scalars,and thus it is harder to find insights from data due to this. To our knowledge, thefirst one to study it empirically was [63]. As for our interest here, if we let „t

1

denotethe top eigenvector at time t, they define the quantity

“ = 1 ≠ Tr1E

Ë(„t

1

(„t1

)T )2

È2,

where the expectation is over time. This quantity can vary between 0 and 1, where thefirst case corresponds to having a constant eigenvector and the latter to an eigenvectorthat is uniformly distributed over the unit sphere. For all the 3 di�erent equitymarkets in the study they find that this coe�cient is always lower for correlationthan for covariance matrices, meaning that the correlation eigenvector is more stable.However they do not attempt to explain this phenomenon.

Theoretically [50] and [1] again study the same model as used here, but only forone-spiked covariance matrices again. Hence our results on eigenvectors are again anextension to theirs. To study the e�ect of the results on real data, an obvious problemis that xt is unobservable as we will never know the true eigenvector „

1

. Hence [1]chose to look at the quantity

E51

„t+·1

2T„t

1

6¥ 1 ≠ 2µ

11 ≠ e≠‘·

2. (1.28)

Thus the same type of process can be used as in the eigenvalue case, i.e. first estimatethe parameters, then calculate the time average of

1„t+·

1

2T„t

1

and compare this to(1.28) when the estimated parameters are inserted. By doing so for 4 di�erent sets ofdata they show in [1] that there again is more variation in the quantity than predicted,again indicating non-stationarity in the covariance matrix. The issue here is the sameas in the eigenvalue case, i.e. the result is only for the covariance structure but thecorrelation structure is what is of interest. In this case there is also a big di�erencebetween the formula for a one-spiked and multiple spiked model. In reality financialdata obviously have many spikes, thus a one-spiked model will not do well.



The type of shrinkage approach studied here is an old idea and was first studied byStein in e.g. [56] under the so-called Stein’s loss. This can essentially be seen asthe Kullback-Leibler distance between the true and the estimated covariance matrixunder a Gaussian model. An example that is somewhat simpler and easier to gain anintuition for, is the so-called linear shrinkage problem introduced in [32]. Here theylook at the class of shrinkage estimators which are a linear combination of a scaledidentity matrix and the sample covariance. Using the framework discussed here,what essentially is happening under this approach is that the shrinkage function isconstrained to be linear, i.e. ”(⁄) = ⁄i = c

1

⁄i + c2

for some scalars c1

and c2

.Just like here the goal is to minimize the expected Frobenius loss. They show thatasymptotically e�cient estimators of the parameters for this estimator exist. Thisis under classical asymptotics, i.e. the number of samples goes to infinity but thepopulation size is kept fixed.

Later the approach of the linear shrinkage was coupled with Stein’s prior approach,it led to the more general question of how one optimally should shrink the eigenvalueswithout any restriction. This is exactly what was done in [31] under the Frobeniusloss. Notice that the name shrinkage estimator might be a bit misleading under themore general setting as there is no guarantee that the optimal decision is to pullthe eigenvalues toward some mean, as in the linear case. In [34] the Stein loss wasalso studied under this general high-dimensional framework and the correspondingestimators were shown to be similar to the one corresponding from the Frobeniusloss.

More exactly, by assuming similar conditions as here on the data except thatB = I, they show in [31] that the optimal shrinkage function is

”(⁄) =

Y____]

____[

⁄|1≠c≠c⁄mF (⁄)|2 If ⁄ > 0

1

(c≠1)mF (0)


, (1.29)

where mF (⁄) = lim”æ0

mF (⁄ + i”) with ⁄ œ R and mF (z) = (1 ≠ c) + cmF (z). Hence


for any finite sample size we would set ⁄i = ”(⁄i) as a proxy to the optimal estimator.As mentioned before, there is however no guarantee that this is the optimal estimatorfor finite sample sizes. However experiments in [31] with simulated data showedthat it performs close to the optimal limit defined by using (1.15). Compared to itslinear counterpart, it is clearly harder to gain intuition from this non-linear shrinkageestimator. The most straightforward way to think about it in the case when ⁄ ”= 0 isthat we simply scale the sample eigenvalues by 1

|1≠c≠c⁄mF (⁄)|2 .The results mentioned above are however(just as ours) of an oracle type, as they

require knowledge of the true eigenvalues through the Stieltjes transform of the samplecovariance matrix (m

�

(·)). It was shown in later papers [33, 35] that the there existestimators of the eigenvalues which are consistent in the thermodynamic limit. Thesecan then be coupled with the above estimation problem to find the optimal covarianceestimator. In noteworthy to mention that the method is however very computationalexpensive as it requires the solution of a non-linear optimization program with non-linear constraints. Just as done here they also show that the same approach can beapplied to the precision matrix. Further in [36] they show how the same methodologycan be used to find the optimal portfolio allocation vector. Recall that normally onewould simply plug in an estimator for either the covariance or the precision matrix intoa given formula to get the portfolio allocation. However they show that it is possibleto use the form of the solution to get the estimator which yields an allocation whichis as close as possible to the optimal allocation.

Just as the Marchenko-Pastur equation renders the asymptotic law when the sam-ple covariance has form (1.3), it was shown in [62] that when it has the form (1.16)then the Stieltjes transform of the asymptotic density satisfies the coupled equations

mS(z) =⁄

R

1a

sR

b1+bceS(z)

dHB(b) ≠ zdHA(a), ’z œ C+,

eS(z) =⁄

R

a

as

Rb

1+bceS(z)


(1.30)

where HA and HB are the measures of the asymptotic spectral densities for A andB (which are assumed to exist and be strictly positive) and


eS(z) = limnæŒ Tr [(S ≠ zIn)≠1A]. Our Theorem 1.2.1 thus is a direct generalizationof this. As for shrinkage estimators of the form (1.21), we are only aware that it hasbeen partly examined in [7]. However their results are based on the approximativereplica method and only focus on studying elliptical models, so it is a subset of whatwe have studied.

1.4 Summary of contributions, discussion & fur-ther directions


To summarize the contribution from the first part of the thesis, we have derivedthe dynamics for the top eigenvalue and eigenvector for correlation and covarianceestimators. This has been done for any covariance matrices (satisfying that the topeigenvalue is much larger than the others) and as just said also for correlation matrices,which has not been done before. We have also derived various quantities related tothe dynamics(by using the dynamical equations), which are all computable from data.Besides their separate interest, we have also shown that the dynamical equations canbe contrasted against each other to get a picture on how covariances and correlationsdi�er dynamically. As it turns out, correlation matrices have a systemic stabilizingpart that is not present in covariance matrices. Our results have also been verifiedwith extensive direct numerical simulations. We have also shown how the idea canbe extended to other estimators (e.g. moving window estimators), which will bediscussed more within the thesis.

There are still plenty of questions related to our results that remain unanswered.As shown above we have here described the behavior of the top eigenvalue and eigen-vector for correlation matrices, however we have not said anything for the remainingeigenvalues. These are a bit trickier as they have high-level interactions with eachother, so a similar theory for each one of them is not expected to hold. Howeverone could, as [1] did for covariance matrices, study overlaps between collections ofeigenvectors. This should be possible by using the same type of calculations as used


here. Another direction would be to approach the problem with something otherthan perturbation theory to possibly get an exact theory. This might however bevery involved as the formulas are already fairly complex and sometimes hard to in-terpret even at this approximative stage. From a more practical perspective, it wouldbe interesting to replace the underlying covariance matrix with a dynamic covarianceprocess and see how this alters the eigenvalue. For example one could let the truecovariance matrix follow some discrete mean-reverting matrix process with Gaussianinnovations, as these have been studied plenty in the empirical finance literature. Thegoal would then be to find a model which has the same stylized facts as the empiricalreturns.

One conclusion that can be drawn from these results is that correlation matricescan in certain situation di�er substantially from covariance matrices from a dynamicviewpoint. As mentioned in a previous section, this type of di�erence has previouslybeen observed empirically. It has more exactly been observed that eigenvalues andeigenvectors of correlation matrices exhibit more stability than those of covariancematrices. This has then been attempted to be explained through various e�ects.However we show that part of it is endogenous to its definition. The results canalso be used as a metric to determine the amount of fluctuations in the correlationstructure, which might be useful when modeling processes describing these. Oneof the issues when using these results in practice though is that, as mentioned, theresults are approximate. Hence in a realistic case where ‘ is not very small, but sayor order 0.05, it might be expected that there are deviations between the empiricalestimator of any quantity and its functional form due to the approximation error.


In the second part of the thesis we have found the estimator which minimizes ÎSn ≠g(A)ÎF for any continuous g(·), under the constraint that Sn has the same eigenvectorsas some given estimator Sn = N≠1A1/2XBXT A1/2 and that n æ Œ together withn/N æ c. As the results are asymptotic we have verified them with direct numericalsimulations and seen that they achieve performance that is close to optimality even


for finite samples of modest size.Hence we have extended the prior idea of rotationally-equivariant estimators into

optimal estimation within a subclass of estimators with the same eigenvectors. Forsome type of problem (where the B matrix comes from the data), this can be thoughtas a pure extension of the prior case. However for the case when this is not true, itopens up the case of freely choosing B(as a weight). Thus a future problem wouldbe to choose a matrix B such that the subclass with the ’best’ eigenvectors is found.After that the approach developed here can be used for this class to find the optimaleigenvalues. This is what we have illustrated both theoretically and numerically byusing Maronna’s m-estimator for elliptical data. Hence our approach gives clear sepa-ration between the eigenvalue and eigenvector estimation problem, as most traditionalshrinkage methods only deal with improving the eigenvalue estimation.

A major note here is that these estimator require the knowledge of the Stieltjestransform of the limiting spectral density, which by itself can be found through (1.30)by using the limiting spectral densities of A and B. Thus this is so far only an oracleresults as the eigenvalues of the true covariance are needed to find the estimator.We however suspect that there exist consistent estimators of these, similar to thoseof [33,35]. However the problem will in this case be even more complex, as it will berequired to solve a non-linear program(with dimension n and N respectively).

1.5 Outline of thesisThe outline of the thesis is as follows. In Chapter 2 we will discuss the characterizationof the top eigenvalue and eigenvector for financial data. Here we start by stating allthe assumptions we make about the structure of the asset returns in Section 2.2. Usingthese assumptions we then derive results for the distribution of the top eigenvalue ofboth the covariance and correlation matrices in Section 2.3. We then do a similaranalysis for eigenvectors in Section 2.4. In Section 2.5 we derive the results fromSection 2.3 for the regular sample estimator instead of the EWMA estimator andsuggest on how/why the results from Section 2.4 can be extended in the same way.We then verify the results by Monte Carlo simulations and check their limitations in


Section 2.6. Finally in Section 2.7 we summarize all our findings and suggest somefurther extension. All longer proofs are in Section 2.8

In Chapter 3 we will discuss covariance estimation of high dimensional matrices.In Section 3.2 we start by giving some motivation as to why we are interested instudying the specific model. We then proceed by stating the assumptions we will usein this chapter. Next we give a background of previous results that are relevant tothis chapter and the section ends with the statement of our first theorem. This is thenfollowed by Section 3.3 where we give some background on rotational-equivariant esti-mation and then state our next two theorems. In Section 3.4 we show how our resultscan give robust covariance estimation and spatio-temporal covariance estimation. InSection 3.5 we summarize our findings, and the remaining Sections are proofs of thetheorems.

Chapter 2

Dynamics of the top eigenvalueand eigenvector of empiricalcorrelations of financial data

2.1 Literature reviewSince covariance and correlation matrices have a major importance in finance, meth-ods of estimating these for financial data have been studied extensively. As mentionedthere is a surjective mapping between the set of covariance matrices and correlationmatrices, where the non-injectivity comes from the fact that covariance matrices withdi�erent volatilities can correspond to the same correlation matrix. Because of thisclose connection between them, a lot of the properties satisfied by covariance matriceshave been assumed to also hold for correlation matrices. The reason for this is thatis substantially easier to deal with covariance matrices as the sample version of these(1.1) are simply a sum of sample second moments, whereas correlation matrices alsorequires an extra normalization. When looking at the asymptotic spectral statistics,the first one to notice their major di�erence is to our knowledge [2], who mentionsthat asymptotics in principal component analysis (PCA) for correlation matrices istoo complicated to be treated in generality. The problem was later solved in [28]using perturbation. A similar problem was also studied in [27].

27

CHAPTER 2. DYNAMICS OF FINANCIAL CORRELATIONS 28

The main reason behind treating covariance and correlation matrices the same alot of the time is that if the data is homoscedastic and normalized, then any staticcovariance estimator is also correlation estimator. In the context of large matrices ithas been shown in e.g. [19] and [25] that in the thermodynamic limit the limiting em-pirical spectral distribution of correlation matrices exhibit exactly the same behavioras that of a covariance matrix. In the context of finance this means that if there is astable relationship between all stocks, then the scaling argument is validated in thethermodynamic limit. It does not however guarantee anything for finite sample sizes.

However it is a known fact that financial relationships are far from stable. Thisfact has recently led into research on how covariance and correlation matrices behavedynamically, mainly by using their spectral distribution. The theory for the dynamicalcase is currently substantially sparser than the well studied static case. The firstpaper in the area was [17] where the dynamics of the eigenvalues and eigenvectorsof a Brownian matrix was studied. This was later extended by [6] to the so-calledWishart processes, which describes the eigenvalue and eigenvector dynamics for thesample covariance of a Brownian matrix. Both of these papers have only found limitedapplications in finance so far.

The first paper to theoretically study the dynamical properties of eigenvalues infinance to our knowledge is [50]. The authors here used the well-known fact that, forfinancial data sets, the largest eigenvalue is orders of magnitude larger than any othereigenvalue (as motivated by [38] the value is approximately equal to the number ofstocks times the average correlation for a correlation matrix). Using this and pertur-bation theory the authors derived a stochastic di�erential equation(SDE) describinghow the eigenvalues of a covariance matrix evolve when they are estimated using anexponentially weighted moving average(EWMA) estimator. This analysis was laterslightly corrected and extended to also include eigenvector dynamics in [1]. Here theauthors also applied the results to financial data from various exchanges. In both ofthese papers it is assumed that all the derived properties of covariance matrices di-rectly carry over to correlation matrices by normalizing the data, as all the empiricalwork is done using correlation matrices.

There has been a large number of papers studying the empirical properties of the


eigenvalues and eigenvectors of covariance and correlation matrices in finance. Mostof these in general only look at the static case and try to identify the significance ofthe di�erent eigenvectors and see how stable PCA is over a few time periods. Thefirst paper to actually look at the full dynamical case to our knowledge is [63]. Hedefines a large number of quantities based on the eigenstructure of the covariance andcorrelation matrix using di�erent estimators. A large number of these has since thenbeen studied theoretically in [1]. He also compares how covariance and correlationsdi�er, and concludes that correlations are empirically less fluctuating in time, buto�ers no explanation to this result. It is just said that this in line with ’the commonlore that the volatilities capture the largest part of the dynamics’.

An earlier and more empirical paper studying the same e�ects is [9]. Here theauthors mainly studies how the top eigenvalue of a correlation matrix evolves in timeand how it interacts with the other eigenvalues. The e�ect they observe is that thetop eigenvalue drives the other ones, which they termed eigenvalue repulsion. Thise�ect, saying that increase of one eigenvalue will decrease all the others, has howevermainly been studied qualitatively. This paper has since been followed by severalpapers studying the same e�ect [49], [52], [60] and [55].

Throughout this chapter of the thesis we will use the quantum mechanical bra-ketnotation to denote vectors, so |·Í will denote a row vector and È·| will denote a columnvector. We will also let · denote expectations.

2.2 Return structure & assumptionsOur goal in this chapter is to describe the dynamics of eigenvalues and eigenvectors ofsample covariance and correlation matrices using di�erent linear estimators of these.This would be an unfeasible task if no specific structure is assumed for the stockreturns. Hence we will assume throughout this chapter that the returns rt follow afactor model with K factors, i.e. we can decompose the returns as

|rtÍ =Kÿ

j=1

|—jÍzj,t + |‘tÍ, (2.1)


where —j œ RN satisfies Î—jÎ2

= 1 and is deterministic but unknown and zj,t and ‘t areindependent Gaussians, with variances ‡2

1

, . . . , ‡2

K and ‡2

‘ . The di�erent factors herecould correspond to e.g. di�erent sectors of the financial market as in for example[48]. Obviously these di�erent sectors will be correlated in any practical application,however by redefining the factors we can make them independent (i.e. all —i areorthogonal). Since it is clear that the covariance matrix corresponding to (2.1) is

� = ‡2

‘ I +Kÿ

j=1

‡2

i |—jÍÈ—j|. (2.2)

If we define ⁄i = ‡2

i + ‡2

‘ ’i Æ K and ⁄i = ‡2

‘ ’i > K, we can then see that theeigenvalues of � are (⁄

1

, . . . , ⁄N) with the corresponding eigenvectors —k. Hence thismodel is a scaled version of the spiked covariance model introduced in [26].

For the above model, it has been shown empirically in finance that the largesteigenvalue is orders of magnitudes larger than the others. Hence in [50] they chooseto use the structure of (2.1) but with K = 1, which implies that all eigenvaluesmodulo the first one have the same value. This means that any other eigenvaluebesides the maximal one is assumed to be equal to the common value. In the followup and correctional paper [1], this same assumption on the structure is used again.

From the structure of the covariance matrix, it follows that the implied correlationmatrix is

C = V

Q

a‡2

‘ I +Kÿ

j=1

‡2

i |—jÍÈ—j|R

b V = ‡2

‘ V 2 +Kÿ

j=1

‡2

i V |—jÍÈ—j|V, (2.3)

where V is a diagonal matrix whose diagonal elements are the inverse of the standarddeviations. Since the main focus and contribution of this topic within the thesis iswith respect to correlation matrices, the simplification that ⁄i = ⁄

2

’i ”= 1 is evennot correct in a simple one-spiked covariance model. In this case we get a correlationmatrix C of the form

C = ‡2

‘ V 2 + ‡2V |—ÍÈ—|V. (2.4)


Since the eigenspace of the part from the identity matrix in this case becomes non-degenerate (unless V is a multiple of the identity matrix) the eigenvalues and eigen-vectors no becomes non-trivial. Hence correlation matrices from this kind of factormodels do not correspond to a spiked distribution (but instead some complicateddistribution that seems to resemble a uniform distribution for a random one-factormodel). Due to this we will not assume that all the bulk eigenvalues are equal sinceit is extremely far from the truth.

For this covariance structure, we will also assume that the top eigenvalue is muchbigger than the remaining ones. We will also assume that this is true for the cor-relation matrix. This in turn gives some extra conditions for the covariance matrix.Notice that one does not imply the other. For example the first condition would besatisfied in a model where all the returns are independent and one asset have a sub-stantially higher volatility that the remaining ones. However for the correspondingcorrelation matrix all eigenvalues are equal. We will not quantify the exact amountnecessary, as most results are still asymptotic and approximative as perturbationtheory is used.

We will also be using the assumption that ‘n converges and that ‘ æ 0. As1/‘ corresponds to the e�ective number of samples used, the first condition can bethought of as the similar condition used in random matrix theory, namely that theratio of the population size and number of samples is kept fixed while the number ofsamples goes to infinity. We will always be looking at finite values of ‘ however aswhen ‘ is 0 then so are the fluctuations. But we will assume that the time betweensamples, �t, goes to 0. This is mainly due to the fact that we will want to makecontinuous time approximations to be able to use di�erential calculus.

2.3 Eigenvalue dynamicsUsing the assumptions of previous sections we will here see how the top eigenvalueof a covariance or correlation matrix varies in time. The section is thus split intotwo subsections, where the first one contains the results for covariance matrices andthe second one contains the results for correlation matrices. Both sections have the


same structure but the correlation case is substantially more involved, thus part of itis deferred to Appendix 2.8.2.

2.3.1 Covariance matrices

We will here follow the same type of approach as in [1]. Thus we estimate the empiricalcovariance matrix Et by using an EWMA estimator with parameter ‘ π 1, so that

Et = (1 ≠ ‘)Et≠1

+ ‘|rtÍÈrt|. (2.5)

If we let ⁄t1

denote the largest eigenvalue of Et we get using first-order perturbationtheory that

⁄t1

= (1 ≠ ‘)⁄t≠1

1

+ ‘È„t≠1

1

|E|„t≠1

1

Í + ‘È„t≠1

1

| (|rtÍÈrt| ≠ E) |„t≠1

1

Í, (2.6)

where „t1

is the top eigenvector and E is the true covariance matrix. Now set (÷t)i,j =rt,irt,j ≠ Ei,j. Since the returns are rt ≥ N (0, E), we get that

÷t = 0,

(÷t)i,j(÷t)k,l = Ei,kEj,l + Ei,lEj,k,(2.7)

and thus from this we see that È„t≠1

1

|÷t|„t≠1

1

Í = 0. Following [50] we define cos(◊t) =È„t

1

|„1

Í as the angle between the true and the estimated top eigenvector. From thiswe can decompose of the estimated top eigenvector into the true top eigenvectorcomponent and an orthogonal component as

|„t1

Í = cos(◊t)|„1

Í + sin(◊t)|„t‹Í. (2.8)

Here ◊t is the angle between the true and estimated top eigenvector and this will bethe object of interest in Section 2.4. We will now use our assumption that ⁄

1

∫ ⁄2

aswell as the fact that cos(◊t) is su�ciently large. We can thus get that È„t≠1

1

|E|„t≠1

1

Í ¥


cos2(◊t) and that

È„t≠1

1

|÷t|„t≠1

1

Í2 =Q

aNÿ

i=1

Nÿ

j=1

„t≠1

1,i (÷t)i,j„t≠1

1,j

R

b2

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i (÷t)i,j„t≠1

1,j „t≠1

1,k (÷t)k,l„t≠1

1,l

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

1,j „t≠1

1,k „t≠1

1,l (Ei,kEj,l + Ei,lEj,k)

¥ cos4(◊t≠1

)Nÿ

i=1

Nÿ

k=1

„1,iEi,k„

1,k

Q

aNÿ

j=1

Nÿ

l=1

„1,jEj,l„1,l

R

b

+ cos4(◊t≠1

)Nÿ

i=1

Nÿ

l=1

„1,iEi,l„1,l

Q

aNÿ

j=1

Nÿ

k=1

„1,jEj,k„

1,k

R

b

= 2 cos4(◊t≠1

)⁄2

1

.

(2.9)

By now inserting all of this into (2.6) it simplifies to

⁄t1

¥ (1 ≠ ‘)⁄t≠1

1

+ ‘ cos2(◊t≠1

)⁄1

[1 + ›t], (2.10)

where ›t is a random variable with mean 0 and variance 2. If we now let the length ofthe estimation window go to 0 we can approximate (2.10) by the corresponding SDE(correcting minor mistake in [1])

d⁄t1

= ‘Ë(cos2(◊t)⁄1

≠ ⁄t1

)dt +Ô

2⁄1

cos2(◊t)dBt

È. (2.11)

As shown in [1] and from Section 2.4.1 and 2.6 here we can set cos2(◊t) ¥ 1. InAppendix 2.8.1 we have shown (mainly for extending the same proof idea to a similarprocess in Section 2.5) that the variogram of this process under the approximationthat cos(◊t) ¥ 1 (so that its an Ornstein-Uhlenbeck process) satisfies

(⁄t+·1

≠ ⁄t1

)2 ¥ 2‘⁄2

1

(1 ≠ e≠‘· ), (2.12)


where · > 0. Another quantity of interest that is easy to estimate from data isthe stationary distribution of the eigenvalue, i.e. its distribution after a long enoughtime such that the impact of the initial condition vanishes. It is well known thatOrnstein-Uhlenbeck processes are Gaussian, so in this case we get that the asymptoticdistribution is

pŒ(x) = 12fi

Ò⁄2

1

/‘exp

A

≠‘(x ≠ ⁄

1

)2

2⁄2

1

B

. (2.13)

Comparing the limiting distribution with discrete data

An important note here is that È„t≠1

1

|÷t|„t≠1

1

Í has a form of shifted generalized Chi-squared distribution and thus have very heavy tails. This implies that if we wereto compare the structure of the sample paths directly to those of (2.6) for finite ‘

for any form empirical data to (2.11) they will look substantially di�erent. Hencethe main strength of this kind of approximation lies within the implied second orderinformation.

2.3.2 Correlation matrices

Our goal is now to do exactly the same analysis as in the previous section but forcorrelation matrices. In order to define the dynamical update the correlation matrixusing an EWMA estimator. Thus we first have to define the diagonal matrix Vt

containing the inverse volatility, so

(Vt)i,i = 1Ò

(Et)i,i

= 1Ò

(1 ≠ ‘)(Et≠1

)i,i + ‘r2

t,i

.

Based on this we can define the empirical correlation matrix at any point in time asCt = VtEtVt. The EWMA update for the correlation matrix is thus

Ct = Vt [(1 ≠ ‘)Et≠1

+ ‘|rtÍÈrt|] Vt = (1 ≠ ‘)VtEt≠1

Vt + ‘Vt|rtÍÈrt|Vt.


If we now define �Vt = Vt ≠ Vt≠1

as the change in inverse volatility we can rewritethe first term on the right-hand side as

VtEt≠1

Vt = (Vt≠1

+ �Vt)Et≠1

(Vt≠1

+ �Vt) ¥ Ct≠1

+ �VtEt≠1

Vt≠1

+ Vt≠1

Et≠1

�Vt,

where we omitted the squared term since �Vt is of order ‘ as we will see below.Using a Taylor expansion for each diagonal element we can approximate �Vt to thefirst-order as

(�Vt)i,i = (Vt)i,i ≠ (Vt≠1

)i,i = 1Ò

(Et≠1

)i,i + ‘[r2

t,i ≠ (Et≠1

)i,i]≠ 1

Ò(Et≠1

)i,i

¥ ≠12‘

Ër2

t,i ≠ (Et≠1

)i,i

È 1(Et≠1

)3/2

i,i

= 12‘

1Ò

(Et≠1

)i,i

A

1 ≠r2

i,t

(Et≠1

)i,i

B (2.14)

If we also define ”rt to be the diagonal matrix defined as

(”rt)i,j =Y]

[ri,t if i = j

0 if i ”= j,

then (2.14) can be simplified to �Vt = 1

2

‘Vt≠1

(I ≠ Vt≠1

”r2

t Vt≠1

) (note here that thematrices commute since they are diagonal). Using this and the fact that |„

1,t≠1

Í isan eigenvector of Vt≠1

Et≠1

Vt≠1

we get that to first-order accuracy in ‘

È„1,t≠1

|�VtEt≠1

Vt≠1

|„1,t≠1

Í = È„1,t≠1

|Vt≠1

Et≠1

�Vt|„1,t≠1

Í

¥ 12‘È„

1,t≠1

|Vt≠1

Et≠1

Vt≠1

(I ≠ Vt”r2

t Vt)|„1,t≠1

Í

= 12‘⁄t≠1

1

11 ≠ È„

1,t≠1

|V ”r2

t Vt|„1,t≠1

Í2

,


Thus the first-order approximation to the first eigenvalue of the correlation matrix is

⁄t1

¥ (1 ≠ ‘)⁄t≠1

1

+ 2(1 ≠ ‘)È„1,t≠1

|�VtEt≠1

Vt≠1

|„1,t≠1

Í + ‘È„1,t≠1

|Vt|rtÍÈrt|Vt|„1,t≠1

Í

¥ (1 ≠ ‘)⁄t≠1

1

+ (1 ≠ ‘)‘⁄t≠1

1

11 ≠ È„

1,t≠1

|V ”r2

t Vt|„1,t≠1

Í2

+ ‘È„1,t≠1

|C|„1,t≠1

Í

+ ‘È„1,t≠1

|(VtrtrTt Vt ≠ C)|„

1,t≠1

Í

¥ ⁄t≠1

1

+ ‘1cos2(◊t)⁄1

≠ ⁄t≠1

1

2+ ‘⁄t≠1

1

11 ≠ È„

1,t≠1

|V ”r2

t Vt|„1,t≠1

Í2

+ ‘È„1,t≠1

|(VtrtrTt Vt ≠ C)|„

1,t≠1

Í

¥ ⁄t≠1

1

+ ‘1cos2(◊t)⁄1

≠ ⁄t≠1

1

2+ ‘È„

1,t≠1

|Vt≠1

’tVt≠1

|„1,t≠1

Í(2.15)

where we have defined ’t = |rtÍÈrt| ≠ E ≠ (⁄t≠1

1

”r2

t ≠ Et≠1

). We can see that thefinal result looks similar to the covariance matrix case in Section 2.3.1, however thelast term in the expression above is fairly involved and hence we defer the details ofthis to Appendix 2.8.2. There we show that È„

1,t≠1

|Vt≠1

’tVt≠1

|„1,t≠1

Í has mean 0 andvariance ‡2 = 2(1 + –

1

)(⁄1

)2 ≠ 4“1

⁄1

, where

–1

=Nÿ

i=1

Nÿ

j=1

1„t≠1

1,i [Ct≠1

]i,j„t≠1

1,j

22

,

“1

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

„t≠1

1,i [Ct≠1

]i,k„t≠1

1,k „t≠1

1,j [Ct≠1

]j,k„t≠1

1,k .

(2.16)

In Figure 2.1 we have plotted ‡2 as a function of ⁄1

for N = 100. We can see herethat as ⁄

1

æ N then ‡ æ 0. Thus this avoids the inconsistency of previous resultsas mentioned in the introduction. We can also contrast the value of ‡2 of correlationmatrices to the corresponding value for covariance matrices of 2⁄2

1

. In [1] these twoquantities are assumed to be the same in their empirical result from these relations.We thus plotted the quotient of these values in Figure 2.2 to visualize the strikinglybig di�erence. We can here see that for very small values they are the very similar,however in the case when ⁄

1

∫ ⁄2

(which is the assumed regime) there is a significantdi�erence between these two.

To get a more quantitative comparison, we can use the result from [38]. Here the


Figure 2.1: Plot of the variance of ⁄1

against the value of ⁄1

for a one factor modelwith constant variance factor and N = 100. We can see that when ⁄

1

> N/2 thevalue of the variance is decreasing. This can be contrasted with the case of covarianceeigenvalue where the corresponding curve is a quadratic.

correlation matrix is approximated by a one-spiked correlation model. This meansthat the correlation matrix equals to 1 on the diagonal and all o�-diagonal entriesare equal to some value fl. Hence for an observed matrix we would thus set fl =

1

N(N≠1)

qNi=1

qj ”=N Ci,j. Under this model it follows that up to first-order in deviation

from a one-spiked model, ⁄1

¥ Nfl and ⁄2

¥ 1 ≠ fl. We can now use this model toapproximate the coe�cients in the variance function. By doing so we get that –

1

¥ fl2

and “1

¥ ⁄1

fl. Hence it follows that ‡2 ¥ 2(1 ≠ fl)2(⁄1

)2 which again be compared tothe covariance case where the volatility was 2(⁄

1

)2.What this is saying is that the larger the average correlation is, the less the

top eigenvalue for correlation matrices is going to fluctuate. This behavior is veryintuitive, which can be seen by examining the corner case where all assets in themarket correlate perfectly and have a non-zero variance. This implies that we atany point in time can estimate the top eigenvalue of the correlation matrix perfectly,as it is equal to the number of assets, whereas the top eigenvalue of the covariancematrix will always be highly fluctuating. For small values of ⁄

1

the behavior of thecorrelation eigenvalue is very similar to the covariance case. By using the one-spiked


correlation approximation further we also that ‡2 ¥ 2Nfl2(1 ≠ fl)2. It is trivial to seethat this function is maximal when fl = 1/2 which agrees well with the general caseshown in Figure 2.2.

Figure 2.2: The plot is depicting the fraction of the variance for the top eigenvalueof a correlation matrix and the corresponding covariance approximation. We can seehere that for small values the variances are the same but as ⁄

1

grow the values arestrongly diverging.

If we now, as in the previous section, take the limit as ‘ æ 0 and let the timebetween each observation go to 0 we get that (2.15) converges to the SDE

d⁄t1

= ‘Ë(cos2(◊t)⁄1

≠ ⁄t1

)dt + ‡ cos2(◊t)dBt

È. (2.17)

Thus we can see that the correlation eigenvalue is described by the same class ofSDEs as the covariance eigenvalue. Hence we get, assuming also that cos(◊t) ¥ 1 inthis case, that

(⁄t+·1

≠ ⁄t1

)2 ¥ ‘‡2

11 ≠ e≠‘·

2,

for · > 0. Similarly to the covariance case we also have a Gaussian stationary


distribution in this case, i.e.

pŒ(x) = 12fi

Ò‡2/2‘

expA

≠‘(x ≠ ⁄

1

)2

4‡2

B

. (2.18)

2.4 Eigenvector dynamicsWe will now step away from studying the eigenvalue ⁄

1

to study its correspondingeigenvector for the case of a one-factor model. As mentioned in the previous sectionwe will not study the full distribution of the eigenvector but rather the angle ◊t

between the eigenvector at time t and its true value. As this is expected to be smallwe follow [1] and define xt = 1 ≠ cos(◊t), so it is thus expected for xt to be close to 0.

2.4.1 Covariance matrices

The dynamics of xt for covariance matrices have been characterized in [1]. In Ap-pendix 2.8.3 we re-derive these results for the same extension as in Section 2.3 (i.e.non-uniform bulk distribution) using the same basic idea as in [1], but with someslight modifications. This however does not alter the form of the results. As seenthere xt is governed by SDE

dxt = 2‘(µcov ≠ xt)dt + ‘Ò

2xt (4xt + ccov)dBt, (2.19)

where µcov = ‘q

i”=1

⁄i

4⁄1

and ccov =q

i”=1

⁄2

i(⁄

1

≠⁄i)

⁄1

qi”=1

⁄i(⁄

1

≠⁄i)

. In [1] an approximative formula forthe asymptotic distribution of xt is provided in the limit when N æ Œ and ‘ æ 0.However in Appendix 2.8.7 we have given a derivation for this stationary distribution(omitted in [1]) for process of the same form as (2.19), which in this case equals

pŒ(x) Ã 12x

3x

4x + ccov

4 2µ‘ccov

3 14x + ccov

4 1

2‘b2

cov+1

. (2.20)

It is found that their supplied formula is omitting a non-constant factor of 1/2x


which for finite values of N and ‘ (as used in applications) gives this formula non-significant bias. As by using a one-spiked approximation (as in their case) we getthat

pŒ(x) Ã 12x

3x

4x + ccov

4 N≠1

2

3 14x + ccov

4 1

2‘+1

. (2.21)

In Figure 2.3 we have plotted both (2.21) and the corresponding distribution from [1]as well as their corresponding means, for the same parameters as shown [1]. Since theexact mean is known to be µcov it is easy to compare to this. It is also noteworthythat the mean using our approximation is more than 1000 as accurate for the sameparameters as in Figure 2.3. Another benefit of finding the stationary distribution

Figure 2.3: Plot of the asymptotic distribution pŒ(x) using the formula supplied in [1](solid line) and using formula (2.21) (dotted line). We have chosen the parametersto be N = 200, ‘ = 0.02 and ccov = 0.02 which implies that µ = 0.02. The verticallines correspond to the calculated means of the distributions. The relative error ofthe mean for the formula supplied by [1] is ¥ 10% which can be compared to theerror of (2.21) which is ¥ 0.01%.

is that we can validate the results one step further. Note that the process xt hasthe physical interpretation of being one minus the cosine of an angle, and thus beingconstrained to be in [0, 2]. To check the lower boundary we can follow [22] and make


sure that the density converges to 0 at 0. Since ccov is bounded, it is clear that

limxæ0

pŒ(x) =Y]

[0 If 2µcov > ‘ccov

Œ 2µ < ‘ccov

We will hence refer to the condition 2µcov > ‘ccov as the Feller type condition. Asfor the upper boundary, this will also clearly depend on the coe�cients. As theseare complicated, we do not validate that it is bounded for the general case. It is noteven expected to hold unless ⁄

1

∫ ⁄2

. What we can do is to look at the one-spikedcase and see its validity there. There it follows that (just as in [1]) µcov ¥ n‘

4

⁄2

⁄1

andccov ¥ ⁄

2

⁄1

. From this it is clear that the Feller type condition is verified as ‘ is assumedto be small. As for the upper bound of the process, note that the mean µcov will bevery small and that the fluctuations will be of order ‘c1/2

cov for small xt and ‘xt forlarge xt. Hence the fluctuations will remain small as ‘ is small and the process ismean reverting, and we will not have any large deviations away from its mean. Thiscan also be seen qualitatively by calculating the stationary distribution numericallyfor some fixed set of parameters.

Based on (2.19) we can also show, as in Appendix 2.8.8, that the variogram(xt+· ≠ xt)2 in the limit when t æ Œ satisfy

(xt+· ≠ xt)2 ¥2µcov‘

1ccov

2

+ 2µcov

2

1 ≠ 2‘

11 ≠ e≠2‘·

2. (2.22)

Unfortunately these quantities involving ◊t or xt have very limited practical applica-bility, as the true eigenvector is unobservable (and it is also the target of the estima-tion). However the quantity È„t

1

|„t+·1

Í, · > 0 is observable and can also serve as agood proxy of the xt process. We have thus in Appendix 2.8.4 studied this quantityand shown that

È„t1

|„t+·1

Í ¥ 1 ≠ 2µcov(1 ≠ e≠‘· ).


2.4.2 Correlation matrices

Similarly to the covariance case we can also derive an SDE for xt of a correlationmatrix. The details of these calculations can be found in Appendix 2.8.5. The formof the SDE is still the same as in the covariance case, but just as for the top eigenvaluethe coe�cients changes dramatically compared to the covariance case. The SDE is

dxt = 2‘ (µcorr ≠ xt) dt + ‘bcorr

Ò2xt (4xt + ccorr)dBt, (2.23)

where

µcorr = ‘

4

S

WUÿ

i”=1

⁄i

⁄1

+ÿ

i”=1

11 + ⁄i

⁄1

22

2 –i ≠ 2 1⁄

1

ÿ

i”=1

“i

A

1 + ⁄i

⁄1

BT

XV

bcorr =Û

1 + –1

≠ 2“1

⁄1

ccorr =q

i”=1

Ëcos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

È2

(⁄1

≠⁄i)

⁄2

1

b2

corr

qi”=1

cos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

(⁄1

≠⁄i)

(2.24)

Here we have also defined, for any h,

–h =Nÿ

i=1

Nÿ

j=1

„t≠1

1,i [Ct≠1

]i,j„t≠1

1,j „t≠1

h,i [Ct≠1

]i,j„t≠1

h,j ,

“h =Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

„t≠1

1,i [Ct≠1

]i,k„t≠1

1,k „t≠1

h,j [Ct≠1

]j,k„t≠1

h,k .

(2.25)

Just as for the di�erence in the mean of the top eigenvalue between covariance andcorrelation matrices, we can also here (however not as easily) see that the mean inthe correlation case is substantially smaller than for the covariance. This togetherwith the extra bcorr, which is smaller than 1, means that there is also a stability e�ectwhen we estimate the eigenvector of the correlation instead of the covariance.

Using Appendix 2.8.7 again we can see that stationary solution for xt has the


density

pŒ(x) Ã 12x

3x

4x + ccorr

4 2µ

‘ccorrb2

corr

3 14x + ccorr

4 1

2‘b2

corr+1

. (2.26)

It is thus clear that the corresponding limit at 0 in this case is

limxæ0

pŒ(x) =Y]

[0 If 2µcov > ‘ccorrbcorr2

Œ 2µ < ‘ccorrbcorr2

which yields a similar Feller type condition to the covariance case.If we compare this asymptotic distribution to the one for covariance matrices we

can see that we have both a shift in mean from changing ccov to ccorr, but also a re-duction in the variance and tails from the extra b2

corr factors. In Figure 2.4 we plotted(2.26) from a one-factor covariance model and compared that to the correspondingvalue in the covariance case from (2.21). Even though we have chosen the param-eters to be in a realistic condition the di�erence between these two distributions issignificant.

To get a quantitative picture and some intuition between the di�erence of thecovariance and correlation case, we have again resorted to an approximate one-spikedcorrelation model. Under this it holds that bcorr ¥ 1 ≠ fl, which means that thehigher the average correlation implies lower the fluctuations. Also µcorr ¥ N‘

4

⁄3

2

⁄1

andccorr ¥ µcorr

b2

corr. A first note again is that, since ‘ is small, the Feller type condition is

verified. As for the di�erences to the covariance case, it is clear that the value fromµcorr is much smaller than the one that would be obtained from the covariance casewith the same parameters. This is as for a one-spiked correlation model it sis truethat ⁄

2

¥ 1 ≠ fl < 1, so ⁄3

2

> ⁄2

. Also as bcorr < 1 this means means that the generallevel of fluctuation will be lower for correlation matrices.

As for covariance matrices, we also here look at the quantity È„t1

|„t+·1

Í, · > 0 aswell. Here it follows that this quantity satisfy

È„t1

|„t+·1

Í ¥ 1 ≠ 2µcorr(1 ≠ e≠‘· ),


Figure 2.4: Plot of the asymptotic distribution pŒ(x) for a covariance matrix (solidline) and for the corresponding correlation matrix (2.21) (dotted line). We havechosen the parameters to be N = 50, ‘ = 0.02 and ccov = 0.02 which implies thatµ = 0.02. To get the correlation values we randomly generated a one-factor modelwith the mentioned parameters which have ⁄

1

¥ 23.5,bcorr ¥ 0.48 and ccorr ¥ 0.045.We then simulated this and calculated the –i’s and “i’s from the simulated data. Thevertical lines correspond to the calculated means of the distributions.

as shown in Appendix 2.8.6. We also have from Appendix 2.8.8 that the variogram is

(xt+· ≠ xt)2 ¥2µcorr‘b

2

corr

1ccorr

2

+ 2µcorr

2

1 ≠ 2‘b2

corr

11 ≠ e≠2‘·

2

2.5 Extension to moving window estimatorsWe now look at how the results in the eigenvalue section can be extended to thecase of a moving window sample correlation matrix. It can similarly be done forthe eigenvector case as well. We will assume that this is done over sliding windowof length T , i.e. we only use the T most recent observations. Hence at each timepoint the oldest observation is removed and a new one is added. This estimator iscommonly used in academic studies since it is easier to study for properties and it ismore robust to single outliers.


2.5.1 Covariance matrix

For the covariance case we define this update as

Et = 1T ≠ 1

tÿ

s=T ≠t+1

|rsÍÈrs| = 1T ≠ 1

t≠1ÿ

s=T ≠t

|rsÍÈrs| + 1T ≠ 1(|rtÍÈrt| ≠ |rt≠T ÍÈrt≠T |)

= Et≠1

+ ‘(|rtÍÈrt| ≠ |rt≠T ÍÈrt≠T |),

where we have defined ‘ = 1

T ≠1

. By now projecting the above on È„t≠1

1

| and |„t≠1

1

Í weget that

⁄t1

= ⁄t≠1

1

+ ‘È„t≠1

1

|E ≠ Et≠1

|„t≠1

1

Í + ‘È„t≠1

1

| (|rtÍÈrt| ≠ E) |„t≠1

1

Í

≠ ‘È„t≠1

1

| (|rT ≠tÍÈrt≠T | ≠ Et≠1

) |„t≠1

1

Í

Using the gaussianity of the returns we can rewrite the above as

⁄t1

¥ ⁄t≠1

1

+ ‘(cos2(◊t≠1

)⁄1

≠ ⁄t≠1

1

) + ‘[⁄1

cos2(◊t≠1

)›1,t + ⁄t≠1

1

›2,t], (2.27)

where ›i,t has mean 0 and variance 2. Since these two variables follow a centralizedchi-square distribution and are independent it follows that their variances are additive.Hence for the same limit as in Section 2.3.1 it follows that (2.27) converges to theSDE

d⁄t1

= ‘Ë(cos2(◊t)⁄1

≠ ⁄t1

)dt +Ô

2⁄1

cos2(◊t)dB1,t +

Ô2⁄t

1

dB2,t

È(2.28)

Hence we end up with a very similar SDE compared to the case of an EWMA esti-mator with the only di�erence being the extra state dependent variance term. Noticehowever that in theory there is an autocorrelation in the di�erential which we have ig-nored here. By now again using that in the case of ⁄

1

∫ ⁄2

we have that cos(◊t) ¥ 1.In Appendix 2.8.9 we show that the variogram of (2.28) satisfies

(⁄t+·1

≠ ⁄t1

)2 ¥ 4‘(⁄1

)2

1 ≠ ‘

11 ≠ e≠‘·

2,


for any · > 0. Hence we can see that compared to the EWMA case of (2.12) wehere get an extra scaling factor of 2/(1 ≠ ‘). The factor 2 here comes from the factthat we are both adding and removing a new term to the covariance. The extra partfrom ‘ can be understood as being the inverse of the decay factor of the exponentialweighting scheme. So we can think of this as being obtained by rescaling the impactof previous returns in each update from (1 ≠ ‘) to 1.

As said above we have ignored an autocovariance in the di�erential of the SDEhere. The e�ect this have is that we get an exponential decay in the variogram whichwould disappear through cancellation if this was taken into account. The asymptoticvalues of these two cases are however the same though. As the underlying covariancematrices are independent when · > T this implies that after this time the variogramshould have the same value (the asymptotic value) for any · . We thus recover aslightly modified version of the result used, but not shown, in [1] that

(⁄t+·1

≠ ⁄t1

)2

-----·>T

¥ 4‘(⁄1

)2

1 ≠ ‘= 4(⁄

1

)2

T ≠ 2

2.5.2 Correlation matrix

As before, it holds similarly for the sample correlation that

Ct = Vt

Q

a 1T ≠ 1

tÿ

s=T ≠t+1

|rsÍÈrs|R

b Vt = VtEt≠1

Vt + ‘Vt (|rtÍÈrt| ≠ |rt≠T ÍÈrt≠T |) Vt

¥ Ct≠1

+ �VtEt≠1

Vt≠1

+ Vt≠1

Et≠1

�Vt + ‘Vt (|rtÍÈrt| ≠ |rt≠T ÍÈrt≠T |) Vt.

(2.29)

Here Vt is similarly as in Section 2.3.2 defined as the diagonal matrix with diago-nal entries (Vt)i,i = 1Ô

(Et≠1

)i,i+‘(r2

t,i≠r2

t≠T,i)= 1Ô

(Et)i,i

. Since the covariance matrix is


updated di�erently this will directly a�ect the value of �Vt. We now have that

(�Vt)i,i = (Vt)i,i ≠ (Vt≠1

)i,i = 1Ò

(Et≠1

)i,i + ‘(r2

t,i ≠ r2

t≠T,i)≠ 1

Ò(Et≠1

)i,i

¥ ≠12‘

Ër2

t,i ≠ r2

t≠T,i

È 1(Et≠1

)3/2

i,i

= 12‘

1Ò

(Et≠1

)i,i

Ar2

i,t≠T

(Et≠1

)i,i

≠r2

i,t

(Et≠1

)i,i

B (2.30)

By now, as in all previous cases, projecting (2.29) onto the eigenvector at time t ≠ 1from left and right we get to with first-order accuracy in ‘ that

⁄t1

¥ ⁄t≠1

1

+ 2È„1,t≠1

|�VtEt≠1

Vt≠1

|„1,t≠1

Í + ‘È„1,t≠1

|Vt (|rtÍÈrt| ≠ |rt≠T ÍÈrt≠T |) Vt|„1,t≠1

Í

¥ ⁄t≠1

1

+ ‘⁄t≠1

1

È„1,t≠1

|V1”r2

t≠T ≠ ”r2

t

2Vt|„1,t≠1

Í + ‘È„1,t≠1

|C|„1,t≠1

Í

+ ‘È„1,t≠1

|(Vt|rtÍÈrt|Vt ≠ C)|„1,t≠1

Í + ‘È„1,t≠1

| (Vt|rtÍÈrt|Vt ≠ C) |„1,t≠1

Í

≠ ‘È„1,t≠1

| (Vt|rt≠T ÍÈrt≠T |Vt ≠ Ct≠1

) |„1,t≠1

Í

¥ ⁄t≠1

1

+ ‘1cos2(◊t)⁄1

≠ ⁄t≠1

1

2+ ‘È„

1,t≠1

|Vt≠1

’t1

Vt≠1

|„1,t≠1

Í

+ ‘È„1,t≠1

|Vt≠1

’t2

Vt≠1

|„1,t≠1

Í,

where ’t1

= |rtÍÈrt| ≠ E ≠ (⁄t≠1

1

”r2

t ≠ Et≠1

) and ’t2

= |rt≠T ÍÈrt≠T | ≠ Et≠1

≠ (⁄t≠1

1

”r2

t≠T ≠Et≠1

). Notice that the only dependence between ’t1

and ’t2

comes through Et≠1

whosevariability is smaller than that of ”r2

t , thus we shall approximate these matrices asbeing independent. The distributions of these matrices are also approximately thesame. Hence by again looking in the limit we get that the eigenvalue satisfies theSDE

d⁄t1

= ‘Ë(cos2(◊t)⁄1

≠ ⁄t1

)dt + ‡ cos2(◊t)dB1,t + ‡dB

2,t

È, (2.31)

where just as in section 2.3.2 we have ‡2 = 2(1 + –1

)(⁄1

)2 ≠ 4“1

⁄1

. By again usingthe result from Appendix 2.8.1 we get that the variogram of the eigenvalue is

(⁄t+·1

≠ ⁄t1

)2 ¥ 2‡2‘11 ≠ e≠‘·

2.


So by the same argument as in Section 2.5.1 we get that

(⁄t+·1

≠ ⁄t1

)2

-----·>T

¥ 2‘‡2 = 2‡2

T ≠ 1

Hence all the discussion from Section 2.3.2 on the di�erence between covarianceand correlation apply exactly in the same way when the EWMA estimator is replacedby a flat banded estimator with a sliding window.

2.6 Simulation verificationIn order to verify the validity of all the results, we will in this section illustrate themby performing Monte-Carlo simulations of the used model. In each of the exampleswe generate N = 106 values rt from a N (0, �) distribution with structure (2.2), wherethe dimension is set to M = 50. For each of the cases we will w.l.o.g. use the empiricalfact that the top eigenvector is entirely positive. When calculating values of interestwe will also discard the first 3000 sample estimates to remove any impact of the initialvalues. We naturally split up the section into one on the eigenvalues and one on theeigenvectors.

2.6.1 Eigenvalue results

We here look at the case of a 5 factor model (i.e. K = 5) where ⁄1

= 200, ⁄2

=· · · = ⁄

5

= 5 and ⁄i = 1 ’i > 5. The cases clearly satisfy the constraint ⁄1

∫⁄

2

. Note here again that as the mapping between the eigenspaces of the covarianceand correlation matrices is non-trivial, this complicates matter since even thoughthe covariance model is a spiked model the correlation is not nessecarily. Here thecorrelation eigenvalues satisfy ⁄

1

/⁄2

¥ 25. If we chose ‘ = 0.04 which seems to be themost common choice in practice [43], we obtain the results shown in Figure 2.5-2.6.

We can see in Figure 2.5 that when using an EWMA estimator, the theoreticaleigenvalue dynamics seems to correspond to the actual one. This is also true forits stationary distribution as seen in Figure 2.6. When looking at the eigenvalue


Figure 2.5: Variogram for the correlation eigenvalue (left) and covariance eigenvalue(right) in the case when K = 5 and ‘ = 0.04 when using an EWMA estimator. Thesolid line represent the simulated average value and the dotted line is the theoreticalvalue.

Figure 2.6: Stationary distribution for the correlation eigenvalue (left) and covarianceeigenvalue (right) in the case when K = 5 and T = 1/‘ = 20 when using an EWMAestimator. The solid line represents the simulated average value and the dotted lineis the theoretical value.

dynamics under the moving window estimator, we can see in Figure 2.7 that theobserved variogram seems looks more like a step function than an exponential. Thisobviously comes from the fact that it is defined as removing an old sample and addinga new. Hence after doing this for T -steps, all consecutive estimates should have the


Figure 2.7: Variogram for the correlation eigenvalue (left) and covariance eigenvalue(right) in the case when K = 5 and ‘ = 0.04 when using a moving window estima-tor. The solid line represents the simulated average value and the dotted line is thetheoretical value.

same value in is variogram with respect to the starting estimate, as their informationsets are disjoint. The reason why our theoretical value is still an exponential is becausewe oversee this fact as explained in Section 2.5.

2.6.2 Eigenvector results

When looking at the top eigenvector dynamics, one can see using the above parametervalues that there will be a gap in theoretical and empirical values. By varying theparameters values our conclusion from this it is that the eigenvector perturbations aresubstantially more sensitive to the assumption that ‘ æ 0. One can namely see thatif one chooses ‘ smaller and smaller, the relative error of the theoretical decreases.Our conclusion from this fact is that the perturbation is correct in the asymptoticregion, however it is not applicable to practical values of ‘ (around 0.04 as in [43]).Hence the theory still gives and intuition of the workings behind the theory, howeverin its current form it is not usable for direct applications unless one can find a why tocorrect the parameter values. To show that the theory still holds in the asymptoticcase we have chosen to do simulations for K = 1 with ⁄

1

= 100 and ‘ = 0.0005.This covariance model corresponds to having ⁄

1

/⁄2

¥ 50 for the correlation matrix.


In Figure 2.8 we can see that the theoretical and empirical values approximatelymatches.

Figure 2.8: Variogram for the correlation eigenvector (left) and covariance eigenvector(right) in the case when K = 1 and ‘ = 0.0001 when using an EWMA estimator. Thesolid line represent the simulated average value and the dotted line is the theoreticalvalue.

2.7 ConclusionsWe have studied how the dynamics of the top eigenvalue and eigenvector for corre-lation and covariance matrices. Previously these two have been assumed to possessexactly the same behavior since in a static situation this should be the case. We showhowever that for both cases the dynamics follow the same class of equations, butthe analytic form of the parameters are substantially di�erent, which give them verydi�erent behaviors. We have shown that these results hold for both moving windowestimators and EWMA estimators, and also how the same analysis could be done forany other linear estimator. These results explain the empirical observations that [63]made about the di�erence in correlation and covariance dynamics for financial data.It also explains the di�erences noted by [9]. We have also extended the models of [1]to include models where the bulk is non-uniform since this is in principle never thecase for correlation matrices. Thus their result is shown as a special case here.


We have also demonstrated that these results are very sensitive with respect tothe averaging parameter. Hence one should be careful before applying this kind of ap-proximative models to empirical data if the purpose is to detect non-stationarity. Onecould however always use these results to gain a general intuition of the underlyingdynamics.

2.8 AppendixIn this appendix we show all the results mentioned in the chapter. In Section 2.8.1,2.8.8 and 2.8.9 we derive the variogram for the processes of ⁄t

1

and xt. In Section 2.8.2we calculate the distribution of the perturbation term that came up in the analysis ofthe top eigenvalue. Sections 2.8.3-2.8.6 derives the general results for the eigenvectordynamics and in Section 2.8.7 the stationary distribution of a generic xt is found.

2.8.1 Variogram for Ornstein Uhlenbeck process

Assume that yt satisfies the SDE

dyt = Ÿ(µ ≠ yt)dt + ‡dBt

Then we can see that

yt = y0

+ Ÿµt ≠ Ÿ⁄ t

0

ysds + ‡Bt,

yt = y0

+ Ÿµt ≠ Ÿ⁄ t

0

ysds.

From this we get that yt = y0

e≠Ÿt+µ(1≠e≠Ÿt). By integrating and taking expectation,y2

t satisfy the ODE

y2

t = y2

0

+ 2Ÿµ⁄ t

0

ysds ≠ 2Ÿ⁄ t

0

y2

sds + ‡2t

= y0

+ 2µËy

0

+ Ÿµt ≠ y0

e≠Ÿt ≠ µ(1 ≠ e≠Ÿt)È

≠ 2Ÿ⁄ t

0

y2

sds + ‡2t.


Solving this gives us that y2

t = y2

0

e≠2Ÿt+(µ2+ ‡2

2Ÿ) [1 ≠ e≠2Ÿt]+2µ(y

0

≠µ) [e≠Ÿt ≠ e≠2Ÿt].Using this we get that for any · > 0

ytyt+· = y2

t + Ÿµ·yt ≠ Ÿ⁄ ·

0

ytyt+sds.

Hence for any fixed · we can treat the above equation as ODE by conditioning onthe information at t, which gives us that ytyt+· = y2

t e≠Ÿ· + µyt (1 ≠ e≠Ÿ· ). Puttingall this together gives us that the variogram vt(·) for a fixed t is

vt(·) = y2

t+· + y2

t ≠ 2ytyt+·

= y2

t e≠2Ÿ· +A

µ2 + ‡2

2Ÿ

B Ë1 ≠ e≠2Ÿ·

È+ 2µ(yt ≠ µ)

Ëe≠Ÿ· ≠ e≠2Ÿ·

È+ y2

t

≠ 2y2

t e≠Ÿ· ≠ 2µyt

11 ≠ e≠Ÿ·

2

= y2

t

11 ≠ e≠Ÿ·

22

+A

µ2 + ‡2

2Ÿ

B Ë1 ≠ e≠2Ÿ·

È≠ 2µ

11 ≠ e≠Ÿ·

2 Ëyt

11 ≠ e≠Ÿ·

2

+µe≠Ÿ·È

.

We can now use that yt = µ and y2

t = µ2 + ‡2

2Ÿto get that the variogram is

v(·) = y2

t+· + y2

t ≠ 2ytyt+·

=A

µ2 + ‡2

2Ÿ

B Ë1 ≠ e≠Ÿ·

È2

+A

µ2 + ‡2

2Ÿ

B Ë1 ≠ e≠2Ÿ·

È

≠ 2µ11 ≠ e≠Ÿ·

2 Ëµ

11 ≠ e≠Ÿ·

2+ µe≠Ÿ·

È

= 2A

µ2 + ‡2

2Ÿ

B Ë1 ≠ e≠Ÿ·

È≠ 2µ2

11 ≠ e≠Ÿ·

2= ‡2

Ÿ

11 ≠ e≠Ÿ·

2.


2.8.2 Variance of correlation di�erential

We here look at the term È„1,t≠1

|Vt≠1

’tVt≠1

|„1,t≠1

Í in (2.15) and derives it mean andvariance. Recall that ’t = |rtÍÈrt| ≠ E ≠ (⁄t≠1

1

”r2

t ≠ Et≠1

) and notice that

È„t≠1

1

|Vt≠1

’tVt≠1

|„t≠1

1

Í =Nÿ

i=1

Nÿ

j=1

„t≠1

1,i (Vt≠1

’tVt≠1

)i,j„t≠1

1,j

=Nÿ

i=1

Nÿ

j=1

„t≠1

1,i (Vt≠1

)i(rt,irt,j ≠ E)Vt≠1,j„t≠1

1,j

≠Nÿ

i=1

Nÿ

j=1

„t≠1

1,i (Vt≠1

)i(⁄t≠1

1

”r2

t ≠ Et≠1

)Vt≠1,j„t≠1

1,j

= ≠⁄t≠1

1

Nÿ

i=1

(„t≠1

1,i )2((Vt≠1

)2

i ”r2

t ) + ⁄t≠1

1

= ⁄t≠1

1

(Î„t≠1

1

Î2

2

≠ 1) = 0

using the fact that (rt,irt,j ≠ Ei,j) = 0 ’i, j (this lets us replace the cross moment bythe covariance). This implies that the quantity we are interested in is

È„t≠1

1

|Vt≠1

’tVt≠1

|„t≠1

1

Í2 =Q

aNÿ

i=1

Nÿ

j=1

„t≠1

1,i (Vt≠1

’tVt≠1

)i,j„t≠1

1,j

R

b2

=Q

aNÿ

i=1

Nÿ

j=1

„t≠1

1,i (Vt≠1

)i(’t)i,j(Vt≠1

)j„t≠1

1,j

R

b2

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i (Vt≠1

)i(’t)i,j(Vt≠1

)j„t≠1

1,j „t≠1

1,k (Vt≠1

)k(’t)k,l(Vt≠1

)l„t≠1

1,l

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

1,j „t≠1

1,k „t≠1

1,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l’i,j’k,l

(2.32)


where,

’i,j’k,l = (rt,irt,j ≠ Ei,j)(rt,krt,l ≠ Ek,l)

+1⁄t≠1

1

[”rt]2i,j ≠ [Et≠1

]i,j2 1

⁄t≠1

1

[”rt]2k,l ≠ [Et≠1

]k,l

2

≠ Cov1rt,irt,j ≠ Ei,j, ⁄t≠1

1

[”rt]2k,l ≠ [Et≠1

]k,l

2

≠ Cov1rt,krt,l ≠ Ek,l, ⁄t≠1

1

[”rt]2i,j ≠ [Et≠1

]i,j2

.

(2.33)

We will now look at each of these three classes of terms separately. However beforedoing so, we define the quantities

–t1

=Nÿ

i=1

Nÿ

j=1

1„t≠1

1,i (Vt≠1

)i[Et≠1

]i,j(Vt≠1

)j„t≠1

1,j

22

,

—t1

=Nÿ

i=1

Nÿ

j=1

1„t≠1

1,i (Vt≠1

)i

23

[Et≠1

]i,i[Et≠1

]i,j(Vt≠1

)j„t≠1

1,j

=Nÿ

i=1

Nÿ

j=1

(„t≠1

1,i )3(Vt≠1

)i[Et≠1

]i,j(Vt≠1

)j„t≠1

1,j ,

“t1

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

„t≠1

1,i (Vt≠1

)i[Et≠1

]i,k(Vt≠1

)k„t≠1

1,k „t≠1

1,j (Vt≠1

)j[Et≠1

]j,k(Vt≠1

)k„t≠1

1,k ,

which will be proven to be very useful. Notice that we above used the fact that

(Vt≠1

)i[Et≠1

]i,i(Vt≠1

)1

= 1 ’i.

If we now look at the first term in (2.33) we can use the gaussianity of the returns toget that

(rt,irt,j ≠ Ei,j)(rt,krt,l ≠ Ek,l) = Ei,kEj,l + Ei,lEj,k,

and thus

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

1,j „t≠1

1,k „t≠1

1,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l(rt,irt,j ≠ Ei,j)(rt,krt,l ≠ Ek,l)

¥ 2(⁄t1

)2.


Next using the gaussianity we get that

1⁄t≠1

1

[”rt]2i,j ≠ [Et≠1

]i,j2 1

⁄t≠1

1

[”rt]2k,l ≠ [Et≠1

]k,l

2

=

Y_________________________________________]

_________________________________________[

[2(⁄t≠1

1

)2 + (⁄t≠1

1

≠ 1)2]E2

i,i, if i = j = k = l

2(⁄t≠1

1

)2E2

i,k + (⁄t≠1

1

≠ 1)2Ei,iEk,k if i = j ”= k = l

E2

i,j, if i = k ”= j = l

E2

i,k, if i = l ”= j = k

≠(⁄t≠1

1

≠ 1)Ei,iEi,l if i = j = k ”= l

≠(⁄t≠1

1

≠ 1)Ei,iEi,k if i = j = l ”= k

≠(⁄t≠1

1

≠ 1)Ei,kEk,k if i ”= j = k = l

≠(⁄t≠1

1

≠ 1)Ej,kEk,k if j ”= i = k = l

≠(⁄t≠1

1

≠ 1)Ei,iEk,l, if i = j ”= k ”= l

≠(⁄t≠1

1

≠ 1)Ei,jEk,k, if i ”= j ”= k = l

Ei,kEk,l, if i ”= j = k ”= l

Ei,jEi,l, if i = k ”= j ”= l

Ei,jEi,k, if i = l ”= k ”= j

Ei,jEi,k, if i ”= j = l ”= k

Ei,jEk,l, if i ”= j ”= k ”= l

So by simply inserting this into the original expression we get that

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

1,j „t≠1

1,k „t≠1

1,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l

·1⁄t≠1

1

[”rt]2i,j ≠ [Et≠1

]i,j2 1

⁄t≠1

1

[”rt]2k,l ≠ [Et≠1

]k,l

2

= (⁄t≠1

1

≠ 1)2 + 2–t1

(⁄t≠1

1

)2 ≠ 1 ≠ 4⁄t≠1

1

1—t

1

≠ Î„t≠1

1,j Î4

4

2

≠ 2⁄t≠1

1

1⁄t≠1

1

≠ 1 + 2Î„t≠1

1,j Î4

4

≠ 2—t1

2+ (⁄t≠1

1

)2

= 2–t1

(⁄t≠1

1

)2


If now look at the last term of (2.33) we have that

Cov1rt,irt,j ≠ Ei,j, ⁄t≠1

1

[”rt]2k,l ≠ [Et≠1

]k,l

2=

Y_________________________________________]

_________________________________________[

2⁄t≠1

1

E2

i,i, if i = j = k = l

2⁄t≠1

1

E2

i,k, if i = j ”= k = l

0, if i = k ”= j = l

0, if i = l ”= j = k

0, if i = j = k ”= l

0, if i = j = l ”= k

2⁄t≠1

1

(Ei,kEk,k), if i ”= j = k = l

2⁄t≠1

1

(Ej,kEk,k), if j ”= i = k = l

0, if i = j ”= k ”= l

2⁄t≠1

1

(Ei,kEj,k), if i ”= j ”= k = l

0, if i ”= j = k ”= l

0, if i = k ”= j ”= l

0, if i = l ”= k ”= j

0, if i ”= j = l ”= k

0, if i ”= j ”= k ”= l

,

from which we get that

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

1,j „t≠1

1,k „t≠1

1,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l

·ËCov

1rt,irt,j ≠ Ei,j, ⁄t≠1

1

[”rt]2k,l ≠ [Et≠1

]k,l

2

+ Cov1rt,krt,l ≠ Ek,l, ⁄t≠1

1

[”rt]2i,j ≠ [Et≠1

]i,j2È

= 4⁄t≠1

1

–t1

+ 4⁄t≠1

1

1“t

1

≠ 2—t1

+ 2Î„t≠1

1,j Î4

4

≠ –t1

2+ 8⁄t

1

1—t

1

≠ Î„t≠1

1,j Î4

4

2

= 4“t1

⁄t≠1

1


We can now insert all the separate results back into (2.32) to conclude that

È„t≠1

1

|Vt≠1

’tVt≠1

|„t≠1

1

Í2 =Q

aNÿ

i=1

Nÿ

j=1

„t≠1

1,i (Vt≠1

’tVt≠1

)i,j„t≠1

1,j

R

b2

= 2(1 + –t1

)(⁄t≠1

1

)2 ≠ 4“t1

⁄t≠1

1

.

This means that

È„t≠1

1

|Vt≠1

’tVt≠1

|„t≠1

1

Í2 ¥ 2(1 + –1

)(⁄1

)2 ≠ 4“1

⁄1

.

2.8.3 Perturbation derivation of Covariance eigenvector dy-namics

We will in this Appendix derive equation (2.19) using perturbation theory. To do thiswe start by rearranging (2.5) as Et = Et≠1

+ ‘÷t, where ÷t = |rtÍÈrt| ≠ Et≠1

. Usingfirst-order perturbations in ‘ we get that

|„t1

Í =Q

a1 ≠ ‘2

2ÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i Í2

(⁄t≠1

1

≠ ⁄t≠1

i )2

R

b |„t≠1

1

Í + ‘ÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i Í(⁄t≠1

1

≠ ⁄t≠1

i )|„t≠1

i Í

¥Q

a1 ≠ ‘2

2(⁄t≠1

1

)2

ÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i Í2

R

b |„t≠1

1

Í + ‘

⁄t≠1

1

ÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i Í|„t≠1

i Í.

(2.34)

Next we project (2.34) onto È„1

| to get that

cos(◊t) =Q

a1 ≠ ‘2

2(⁄t≠1

1

)2

ÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i Í2

R

b cos(◊t≠1

)

+ ‘

⁄t≠1

1

Q

aÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i ÍÈ„1

|„t≠1

i ÍR

b .

(2.35)

We now wish to simplify all the above terms as far as possible. Before proceedingwith this explicitly, note that the terms È„

1

|„t≠1

i Í will be needed. As we will see later,the central quantity in all expressions involving È„

1

|„t≠1

i Í is È„1

|„t≠1

i Í2. We start by


noticing that the eigenvectors forms a complete orthogonal basis and hence |„1

Í isspanned by {|„t

iÍ}iœ{1,...,N}, which means that

|„1

Í =ÿ

i

È„1

|„tiÍ|„t

iÍ, (2.36)

so by projecting È„1

| onto this and rearranging we get the equation

sin2(◊t) = 1 ≠ cos2(◊t) = 1 ≠ È„t1

|„tiÍ2 =

ÿ

i”=1

È„1

|„tiÍ2. (2.37)

We can also consider the first-order perturbation for all the other eigenvectors thanthe top one and the project È„

1

| onto these just as in the top case. This gives us forany h ”= 1

È„1

|„thÍ =

Q

a1 ≠ ‘2

2ÿ

i”=h

È„t≠1

h |÷t|„t≠1

i Í2

(⁄t≠1

h ≠ ⁄t≠1

i )2

R

b È„1

|„t≠1

h Í + ‘ÿ

i”=h

È„t≠1

h |÷t|„t≠1

i ÍÈ„1

|„t≠1

i Í(⁄t≠1

h ≠ ⁄t≠1

i ).

Now we know that È„1

|„t≠1

1

Í is substantially bigger than any of the other terms.Hence we can discard all the second order term and approximate the above equationas

È„1

|„thÍ ≠ È„

1

|„t≠1

h Í = ‘È„t≠1

h |÷t|„t≠1

1

Í(⁄t≠1

h ≠ ⁄t≠1

1

)È„

1

|„t≠1

1

Í. (2.38)

This can be further projected onto the top eigenvector(using that cos(◊t) ¥ 1) to getthat

È„t≠1

h |÷t|„t≠1

1

Í(⁄t≠1

h ≠ ⁄t≠1

1

)¥ È„

1

|÷t|„t≠1

1

Í(⁄t≠1

h ≠ ⁄t≠1

1

)È„

1

|„t≠1

h Í,

from which it follows that

È„1

|÷t|„t≠1

1

Í(⁄t≠1

h ≠ ⁄t≠1

1

)¥ ≠ ⁄

1

(⁄1

≠ ⁄h) ,


andA

È„1

|÷t|„t≠1

h Í(⁄t≠1

h ≠ ⁄t≠1

1

)

B2

¥ ⁄1

⁄h

(⁄h ≠ ⁄1

)2

.

Hence we can in the continuous limit approximate (2.38) by

dÈ„1

|„thÍ = ≠‘

⁄1

(⁄1

≠ ⁄h)È„1

|„t1

Ídt + ‘

Ô⁄

1

⁄h

(⁄1

≠ ⁄h)dBt.

From this it follows by Ito’s lemma that we have

dÈ„1

|„thÍ2 = ≠2‘

⁄1

(⁄1

≠ ⁄h)È„1

|„thÍ2dt + 2‘

Ô⁄

1

⁄h

(⁄h ≠ ⁄1

)È„1

|„thÍdBt + ‘2

⁄1

⁄h

(⁄h ≠ ⁄1

)2

dt

= 2‘⁄

1

(⁄1

≠ ⁄h)

A

‘⁄h

2(⁄1

≠ ⁄h) ≠ È„1

|„thÍ2

B

dt + 2‘

Ô⁄

1

⁄h

(⁄h ≠ ⁄1

)È„1

|„thÍdBt,

so we can see that È„1

|„thÍ2 ¥ ‘ ⁄h

2(⁄1

≠⁄h)

. To make the approximation more accurateand consistent, we will use the criterion (2.37) to normalize this quantity. From thiswe get our final approximation as

È„1

|„thÍ2 ¥ sin2(◊t)

⁄h

(⁄1

≠⁄h)

qi”=1

⁄i

(⁄1

≠⁄i)

. (2.39)

We will now proceed with dynamics for the top projector. Notice first that theeigenvectors form a complete orthogonal basis (and thus eliminates all dependenceon Et≠1

) to get that

ÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i ÍÈ„1

|„t≠1

i Í = cos(◊t≠1

)ÿ

i”=1

È„1

|rtrtú|„1

ÍÈ„1

|„t≠1

i Í2

= ⁄1

sin2(◊t≠1

) cos2(◊t≠1

)

We can also see that

ÿ

i”=1

È„t≠1

1

|rtrtú|„t≠1

i Í2 ¥ cos2(◊t≠1

)ÿ

i”=1

È„1

|rtrtú|„t≠1

i Í2 ¥ cos2(◊t≠1

)⁄1

ÿ

i”=1

⁄i.


We also need the second moment of the first-order term. To get this we will haveto use (2.39). Notice that this di�ers substantially from the assumption of [1] inthat they implicitly assume that the orthogonal vector in the decomposition (2.8) of|„t

1

Í is equally spread out in the space spanned the remaining eigenvectors and thatthe variability with respect to each single eigenvectors is small, i.e they assume thatÈ„t≠1

‹ |„t≠1

j Í2 ¥ 1

N≠1

. Now

Q

aÿ

i”=1

È„t≠1

1

|÷t|„t≠1

i ÍÈ„1

|„t≠1

i ÍR

b2

¥ cos2(◊t≠1

)ÿ

i”=1

ÿ

j ”=1

È„1

|rtrtú|„1

Í2È„1

|„t≠1

i Í2È„1

|„t≠1

j Í2

+ cos2(◊t≠1

)ÿ

i”=1

È„1

|rtrtú|„t≠1

i Í2È„1

|„t≠1

i Í2

¥ 2⁄2

1

cos2(◊t≠1

) sin4(◊t≠1

) + ⁄1

cos2(◊t≠1

) sin2(◊t≠1

)q

i”=1

⁄2

i

(⁄1

≠⁄i)qi”=1

⁄i

(⁄1

≠⁄i)

.

By inserting these approximations into (2.35) we end up with the SDE

d cos(◊t) = ≠ ‘2

2(⁄t1

)2

S

Ucos2(◊t)⁄1

ÿ

i”=1

⁄i

T

V cos(◊t)dt + ‘ sin2(◊t) cos(◊t)dt + ‡tdBt,

where

‡2

t = ‘2

⁄2

1

S

WU2⁄2

1

sin2(◊t) cos2(◊t) + ⁄1

cos2(◊t≠1

)q

i”=1

⁄2

i

(⁄1

≠⁄i)qi”=1

⁄i

(⁄1

≠⁄i)

T

XV sin2(◊t).

Remember now that xt = 1≠cos(◊t) and use that cos2(◊t) ¥ 1≠2xt and sin2(◊t) ¥ 2xt

to get that

dxt = 2‘(µcov ≠ xt)dt + ‘Ò

2xt (4xt + ccov)dBt,

where µcov = ‘q

i”=1

⁄i

4⁄1

and ccov =q

i”=1

⁄2

i(⁄

1

≠⁄i)

⁄1

qi”=1

⁄i(⁄

1

≠⁄i)

.


as the xtxt+· term is negligible. From the definition of Et we have that

Et+· = (1 ≠ ‘)· Et + ‘·≠1ÿ

s=0

(1 ≠ ‘)s (|rt+·≠sÍÈrt+·≠s| ≠ E) . (2.44)

It is obvious that È„1

|Et|„iÍ = 0 and we also have that

È„1

|Et|„iÍ2 = ‘2

Œÿ

s=0

(1 ≠ ‘)2sÈ„1

| (|rsÍÈrs|) |„iÍ2 = ⁄1

⁄i‘2

Œÿ

s=0

1(1 ≠ ‘)2

2s

= ⁄1

⁄i‘2

2‘ ≠ ‘2

¥ ‘⁄1

⁄i/2,

so from this we get

ÿ

i”=1

È„1

|Et|„iÍÈ„1

|Et+· |„iÍ = (1 ≠ ‘)·ÿ

i”=1

È„1

|Et|„iÍ2 ¥ (1 ≠ ‘)· ‘

2⁄1

ÿ

i”=1

⁄i.

It also follows that È„1

|Et|„iÍÈ„1

| (|rt+·≠sÍÈrt+·≠s| ≠ E) |„iÍ ¥ 0 when s > 0 as thereturns are independent in time. If we also use that (1 ≠ ‘)· æ e≠‘· as ‘ æ 0, andinsert everything into (2.43) we end up with

È„t1

|„t+·1

Í ¥ 1 ≠ 2µcov + e≠‘· ‘

(⁄1

)2

⁄1

ÿ

i”=1

⁄i = 1 ≠ 2µcov(1 ≠ e≠‘· ). (2.45)

2.8.5 Perturbation derivation of Correlation eigenvector dy-namics

We will now use the same approach as in Appendix 2.8.3 but for the correlationdynamics examined in Section 2.3.2. This means using (2.35) and replacing ÷t witha value corresponding to correlation matrix. We have that

Ct = Ct≠1

+ ‘512(I ≠ Vt≠1

”r2

t Vt≠1

)Ct≠1

+12Ct≠1

(I ≠ Vt≠1

”r2

t Vt≠1

) + Vt|rtÍÈrt|Vt ≠ Ct≠1

6.

(2.46)


Hence define

’t =512(I ≠ Vt≠1

”r2

t Vt≠1

)Ct≠1

+ 12Ct≠1

(I ≠ Vt≠1

”r2

t Vt≠1

) + Vt|rtÍÈrt|Vt ≠ Ct≠1

6

and use this instead of ÷t in (2.35), i.e.

cos(◊t) =Q

a1 ≠ ‘2

2(⁄t≠1

1

)2

ÿ

i”=1

È„t≠1

1

|’t|„t≠1

i Í2

R

b cos(◊t≠1

)

+ ‘

⁄t≠1

1

Q

aÿ

i”=1

È„t≠1

1

|’t|„t≠1

i ÍÈ„1

|„t≠1

i ÍR

b .

(2.47)

For the products between the first eigenvector we have that

ÿ

i”=1

È„t≠1

1

|’t|„t≠1

i ÍÈ„1

|„t≠1

i Í ¥ cos(◊t≠1

)ÿ

i”=1

È„1

|Vt|rtÍÈrt|Vt|„1

ÍÈ„1

|„t≠1

i Í2

= ⁄1

sin2(◊t≠1

) cos2(◊t≠1

).

We next look at the cross moment between the first eigenvector and any other eigen-vector indexed by h. From this it follows that

„T1,t≠1

�VtEt≠1

Vt≠1

„h,t≠1

¥ 12‘„T

1,t≠1

(I ≠ Vt”r2

t Vt)Vt≠1

Et≠1

Vt≠1

„h,t≠1

= ≠12‘⁄t≠1

h [”rtVt„1,t≠1

]T [”rtVt„h,t≠1

]),

and

„T1,t≠1

Vt≠1

Et≠1

�Vt„h,t≠1

¥ ≠12‘⁄t≠1

1

[”rtVt„1,t≠1

]T [”rtVt„h,t≠1

]).

If we thus define

≠ 12(⁄t≠1

1

+ ⁄t≠1

h )Vt≠1

”r2

t Vt≠1

+ Vt|rtÍÈrt|Vt ≠ Ct≠1

¥ ≠12(⁄t≠1

1

+ ⁄t≠1

h )Vt≠1

”r2

t Vt≠1

+ Vt≠1

|rtÍÈrt|Vt≠1

≠ Ct≠1

© Vt≠1

’tVt≠1

,


we can see that

’i,j ’k,l = (rt,irt,j ≠ Ei,j)(rt,krt,l ≠ Ek,l) + 14(⁄t≠1

1

+ ⁄t≠1

h )2[”rt]2i,j[”rt]2k,l

≠ 12 Cov

1rt,irt,j ≠ Ei,j, (⁄t≠1

1

+ ⁄t≠1

h )[”rt]2k,l

2

≠ 12 Cov

1rt,krt,l ≠ Ek,l, (⁄t≠1

1

+ ⁄t≠1

h )[”rt]2i,j2

.

Next calculate the variance of the product of ’t with „t≠1

1

and another eigenvector„t≠1

h , where h ”= 1. We see that

È„t≠1

1

|’t|„t≠1

h Í2

=Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

h,j „t≠1

1,k „t≠1

h,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l

· (rt,irt,j ≠ Ei,j)(rt,krt,l ≠ Ek,l)

+ 14

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

h,j „t≠1

1,k „t≠1

h,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l

· (⁄t≠1

1

+ ⁄t≠1


≠ 12

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

h,j „t≠1

1,k „t≠1

h,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l

·ËCov


1

+ ⁄t≠1

h )[”rt]2k,l

2

+ Cov1rt,krt,l ≠ Ek,l, (⁄t≠1

1

+ ⁄t≠1

h )[”rt]2i,j2È

(2.48)

Just as in the previous section, we have that the first term equals cos2(◊)⁄t≠1

1

⁄t≠1

h .The second term satisfy

[”rt]2i,j[”rt]2k,l =

Y___]

___[

3E2

i,i, if i = j = k = l,

2E2

i,k + Ei,iEk,k, if i = j ”= k = l,

0, otherwise.


which gives us that

14

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

h,j „t≠1

1,k „t≠1

h,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l(⁄t≠1

1

+ ⁄t≠1


¥ (⁄t≠1

1

+ ⁄t≠1

h )2

2

Nÿ

i=1

Nÿ

k=1

[„t≠1

1,i (Ct≠1

)i,k„t≠1

1,k ][„t≠1

h,i (Ct≠1

)i,k„t≠1

h,k ] © (⁄t≠1

1

+ ⁄t≠1

h )2

2 –h.

The covariance term of (2.48) get the same for as in Appendix 2.8.2 up to the di�erentscaling factor, hence we have

12

Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

Nÿ

l=1

„t≠1

1,i „t≠1

h,j „t≠1

1,k „t≠1

h,l (Vt≠1

)i(Vt≠1

)j(Vt≠1

)k(Vt≠1

)l

·ËCov


1

+ ⁄t≠1

h )[”rt]2k,l

2

+ Cov1rt,krt,l ≠ Ek,l, (⁄t≠1

1

+ ⁄t≠1

h )[”rt]2i,j2È

= 2(⁄t≠1

1

+ ⁄t≠1

h )Nÿ

i=1

Nÿ

j=1

Nÿ

k=1

[„t≠1

1,i Ci,k„t≠1

1,k ][„t≠1

h,j Cj,k„t≠1

h,k ]

© 2“h(⁄t≠1

1

+ ⁄t≠1

h ).

So putting this together we get that

È„t≠1

1

|’t|„t≠1

h Í2 ¥ È„t≠1

1

|Vt≠1

’tVt≠1

|„t≠1

h Í2

¥ cos2(◊)⁄1

⁄h + (⁄1

+ ⁄h)2

2 –h ≠ 2“h(⁄1

+ ⁄h).(2.49)

We can now use this to calculate our second quantity of interest as

ÿ

i”=1

È„t≠1

1

|’t|„t≠1

i Í2 ¥ cos2(◊)⁄1

ÿ

i”=1

⁄i +ÿ

i”=1

(⁄1

+ ⁄i)2

2 –i ≠ 2ÿ

i”=1

“i(⁄1

+ ⁄i).

For the second order terms we will also need terms of the formÈ„t≠1

1

|’t|„t≠1

k ÍÈ„t≠1

1

|’t|„t≠1

l Í. In the covariance case we deduced that these were ofzero mean, but that is not the case here because of the correction terms. Howeversince the terms to which these are multiplied are very small, the total e�ect willbe of reasonably small magnitude. A more complex model could take these into


consideration, however here we choose to approximate them by 0.As for the square of the first-order term, we note that as just for the covariance

we here have that

È„1

|„thÍ2 ¥ ‘

È„1

|’t|„t≠1

1

Í2

2(⁄1

≠ ⁄h)È„1

|’t|„t≠1

1

Í

¥ ‘cos2(◊)⁄

1

⁄h + (⁄1

+⁄h)

2

2

–h ≠ 2“h(⁄1

+ ⁄h)2(⁄

1

≠ ⁄h)⁄1

.

So by again using the orthogonality of the eigenvectors we get that

È„1

|„thÍ2 ¥ sin2(◊t)

cos

2

(◊)⁄1

⁄h+

(⁄1

+⁄h)

2

2

–h≠2“h(⁄1

+⁄h)

2(⁄1

≠⁄h)⁄1

qi”=1

cos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–h≠2“h(⁄1

+⁄i)

2(⁄1

≠⁄i)⁄1

= sin2(◊t)cos

2

(◊)⁄1

⁄h+

(⁄1

+⁄h)

2

2

–h≠2“h(⁄1

+⁄h)

(⁄1

≠⁄h)

qi”=1

cos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

(⁄1

≠⁄i)

.

(2.50)

By now using this we can calculate the second moment as

Q

aÿ

i”=1

È„t≠1

1

|’t|„t≠1

i ÍÈ„1

|„t≠1

i ÍR

b2

¥ cos2(◊t≠1

)ÿ

i”=1

ÿ

j ”=1

È„1

|’t|„1

Í2È„1

|„t≠1

i Í2È„1

|„t≠1

j Í2

+ cos2(◊t≠1

)ÿ

i”=1

È„1

|’t|„t≠1

i Í2È„1

|„t≠1

i Í2

¥ sin4(◊t≠1

) cos2(◊t≠1

)Ë2(1 + –

1

)(⁄1

)2 ≠ 4“1

⁄1

È

+ sin2(◊t≠1

) cos2(◊t≠1

)q

i”=1

Ëcos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

È2

(⁄1

≠⁄i)

qi”=1

cos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

(⁄1

≠⁄i)

.


Putting all of this together we get that cos(◊t) satisfies the SDE

d cos(◊t) = ≠ ‘2

2(⁄1

)2

S

Ucos2(◊)⁄1

ÿ

i”=1

⁄i +ÿ

i”=1

(⁄1

+ ⁄i)2

2 –i ≠ 2ÿ

i”=1

“i(⁄1

+ ⁄i)T

V dt

+ ‘ sin2(◊t≠1

) cos(◊t≠1

)dt + ‡tdBt,

where

‡2

t = ‘2

(⁄1

)2

1sin2(◊t≠1

)Ë2(1 + –

1

)(⁄1

)2 ≠ 4“1

⁄1

È

+q

i”=1

Ëcos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

È2

(⁄1

≠⁄i)

qi”=1

cos

2

(◊)⁄1

⁄i+(⁄

1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

(⁄1

≠⁄i)

R

ddddbsin2(◊t) cos2(◊t≠1

).

If we set xt = 1 ≠ cos(◊t) and use that cos2(◊t) ¥ 1 ≠ 2xt and sin2(◊t) ¥ 2xt we getthat

dxt = 2‘ (µcorr ≠ xt) dt + ‘bcorr

Ò2xt (4xt + ccorr)dBt,

where

µcorr = ‘

4

S

WUÿ

i”=1

⁄i

⁄1

+ÿ

i”=1

11 + ⁄i

⁄1

22

2 –i ≠ 2 1⁄

1

ÿ

i”=1

“i

A

1 + ⁄i

⁄1

BT

XV ,

bcorr =Ò

1 + –1

≠ 2 “1

⁄1

and ccorr =q

i”=1

Ëcos

2

(◊)⁄1

⁄i+

(⁄1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

È2

(⁄1

≠⁄i)

⁄2

1

b2

corr

qi”=1

cos

2

(◊)⁄1

⁄i+

(⁄1

+⁄i)

2

2

–i≠2“i(⁄1

+⁄i)

(⁄1

≠⁄i)

.

2.8.6 Eigenvector overlap for correlation matrices

We will here calculate È„t1

|„t+·1

Í for correlation matrices in a similar fashion as donefor covariance matrices in Appendix 2.8.4. Thus we consider the empirical correlation


matrix as a perturbation of the exact correlation matrix as

Ct = C + Ct. (2.51)

Also define the volatility perturbation as

Vt = V + Vt.

By using a Taylor expansion in a similar way as in Section 2.3.2 we can see that

Vt = Vt ≠ V ¥ 12‘V

1I ≠ V ≠2(Vt)2

2,

and thus Vt is of order ‘. By using (2.40) and the above decomposition we can seethat

Ct = VtEVt + VtEtVt = C + V EVt + VtEV + VtEVt + VtEtVt, (2.52)

so from this we can identify, by comparing with (2.51) and keeping at most first-orderterms, that

Ct ¥ V EVt + VtEV + VtEtVt. (2.53)

By applying (2.46) iteratively we get that

Ct = ‘Œÿ

·=0

(1 ≠ ‘)·512(I ≠ Vt≠·≠1

”r2

t≠· Vt≠·≠1

)Ct≠·≠1

+ 12Ct≠·≠1

(I ≠ Vt≠·≠1

”r2

t≠· Vt≠·≠1

)

+Vt≠· |rt≠· ÍÈrt≠· |Vt≠· ≠ C] .

By using the same type of perturbations as in Appendix 2.8.4 we can see that

È„t1

|„t+·1

Í = 1 ≠ 2µcorr + 1(⁄

1

)2

ÿ

i”=1

È„1

|Ct|„iÍÈ„1

|Ct+· |„iÍ (2.54)

We now want to express Ct+· in terms of Ct. To this end we first look at how theupdate of the volatility changes in time. In order to do this we again have to use a


Taylor expansion, so

Vt+· ¥ Vt + 12‘Vt

·≠1ÿ

i=0

(1 ≠ ‘)iËI ≠ (Vt)≠2”rt+·≠i

È.

From the definition of Vt this directly implies that

Vt+· ¥ Vt + 12‘Vt

·≠1ÿ

i=0

(1 ≠ ‘)iËI ≠ (Vt)≠2”rt+·≠i

È.

Now define �Et+·,t = ‘q·≠1

s=0

(1≠‘)s (|rt+·≠sÍÈrt+·≠s| ≠ E) and �Vt+·,t = 1

2

‘Vtq·≠1

i=0

(1≠‘)i [I ≠ (Vt)≠2”rt+·≠i]. By plugging this into (2.53) and only keeping first-order terms,we get that

Ct+· = V EVt+· + Vt+· EV + Vt+· Et+· Vt+·

¥ V EVt + V E�Vt+·,t + VtEV + �Vt+·,tEV + (1 ≠ ‘)· VtEtVt + Vt�Et+·,tVt

¥ (1 ≠ ‘)· Ct + 2‘·V EVt + V E�Vt+·,t + �Vt+·,tEV + Vt+· �Et+·,tVt+· .

(2.55)

It is again obvious that È„1

|Ct|„hÍ = 0 for any h ”= 0, and we also have that

È„1

|Ct|„hÍ2 =C

⁄1

⁄h + (⁄1

+ ⁄h)2

2 –h ≠ 2“h(⁄1

+ ⁄h)D

‘2

Œÿ

s=0

1(1 ≠ ‘)2

2s

¥ ‘/2C

⁄1

⁄h + (⁄1

+ ⁄h)2

2 –h ≠ 2“h(⁄1

+ ⁄h)D

.

Compared to the covariance case, we can see from (2.55) that Ct+· now is morecomplicated. However the impact of the all the terms besides the (1 ≠ ‘)· Ct whencoupled with the È„

1

|Ct|„hÍ term is again small due to the independence between pastand future returns. Hence we also here get that

ÿ

i”=1

È„1

|Ct|„iÍÈ„1

|Ct+· |„iÍ = (1 ≠ ‘)·ÿ

i”=1

È„1

|Ct|„iÍ2 ¥ (1 ≠ ‘)· 2µcorr,


and thus we can also in the correlation case conclude that

È„t1

|„t+·1

Í ¥ 1 ≠ 2µcorr(1 ≠ e≠‘· ).

2.8.7 Stationary distribution for xt

In general the stationary distribution pŒ(x) of an Ito di�usion satisfies the SDE

LúpŒ = 0,

where Lú is the adjoint of the process generator L [45]. We now want to solve thisfor process of the form

dxt = 2‘(µ ≠ xt)dt + ‘bÒ

2xt (4xt + c)dBt,

where µ, b, c œ R. By applying the corresponding generator, the resulting ordinarydi�erential equation (ODE) has the solution

pŒ(x) = a1

s x” f(y)dy

‘b2f(x)[2x (4x + c)] + a2

1‘b2f(x)[2x (4x + c)] ,

where ” is some constant chosen such that xt Ø ” a.s. and

f(x) = expA

≠2⁄ x

”

2‘(µ ≠ y)‘2b22y (4y + c)dy

B

= expA

≠ 2‘b2

⁄ x

”

µ

y (4y + c)dy + 2‘b2

⁄ x

”

14y + c

dy

B

Since the dependence of the lower boundary enters in such a way that we can cancelit out by redefining the constant, we redefine f(x) to be

f(x) = expA

≠ 2µ

‘b2

log(x) ≠ log(4x + c)c

+ 2‘b2

log(4x + c)4

B

=34x + c

x

4 2µ

‘b2c (4x + c)1

2‘b2


By now again noting that the integral of f(x) enter linearly, we can ignore the e�ectof the lower boundary. We also have that

⁄ x

”f(y)dy ¥ c

2µ

‘b2c x≠ 2µ

‘b2c+

1

2‘b2

+1

≠ 2µ‘b2c

+ 1

2‘b2

+ 1+ const(”).

Putting all together we have that

pŒ(x) ¥ 12x (4x + c)

3x

4x + c

4 2µ

‘b2c3 1

4x + c

4 1

2‘b2

5≠a

1

x≠ 2µ

‘b2c+

1

2b2‘+1 + a

2

6

By dropping the negative a1

part we get that

pŒ(x) Ã 12x

3x

4x + c

4 2µ

‘b2c3 1

4x + c

4 1

2b2‘+1

2.8.8 Variogram for xt

We here want to derive an expression for the variogram of a process xt, i.e. (xt+· ≠ xt)2

for · > 0, satisfying the SDE

dxt = 2‘(µ ≠ xt)dt + ‘bÒ

2xt (4xt + c)dBt, (2.56)

where µ, b, c œ R are some given constants. If we integrate (2.56) and take expectationwe get that

xt = x0

+ 2‘µt ≠ 2‘⁄ t

0

xsds,

so by solving this ODE we obtain xt = µ (1 ≠ e≠2‘t) + x0

e≠2‘t. In the same way wecan use the integrated version of xt+· and condition xt to get that

xtxt+· = x2

t + 2‘xtµ· ≠ 2‘⁄ t+·

txtxt+sds.


By solving this ODE we get that xtxt+· = µxt (1 ≠ e≠2‘· ) + x2

t e≠2‘· . Next apply Ito’s

lemma to (2.19) and take expectation to get that

x2

t = x2

0

+ 4‘

A

µ + ‘b2c

2

B ⁄ t

0

xsds ≠ 4‘(1 ≠ 2‘b2)⁄ t

0

x2

sds

= x2

0

+ 4‘

A

µ + ‘b2c

2

B 5x

0

2‘

11 ≠ e≠2‘t

2≠ µ

2‘

11 ≠ e≠2‘t

2+ µt

6

≠ 4‘(1 ≠ 2‘b2)⁄ t

0

x2

sds.

By solving this ODE we get that

x2

t = x2

0

e≠4‘(1≠2b2‘)t ≠2(µ ≠ x

0

)1µ + ‘b2c

2

2

1 ≠ 4‘b2

Ëe≠2‘t ≠ e≠4‘(1≠2‘b2

)tÈ

+µ

1µ + ‘b2c

2

2

1 ≠ 2‘b2

11 ≠ e≠4‘(1≠2‘b2

)t2

We will now use that in all the cases of our interest here b Æ 2. We can now plug allof this into the definition of the variogram to get that, let t æ Œ and discard termsof smaller order to get that

(xt+· ≠ xt)2 = x2

t+· + x2

t ≠ 2xtxt+·

¥ x2

t

11 ≠ e≠2‘·

22

≠2(µ ≠ xt)

1µ + ‘b2c

2

2

1 ≠ 4‘b2

Ëe≠2‘· ≠ e≠4‘·

È

+µ

1µ + ‘b2c

2

2

1 ≠ 2‘b2

11 ≠ e≠4‘·

2≠ 2µxt

11 ≠ e≠2‘·

2

¥µ

1µ + ‘b2c

2

2

1 ≠ 2‘b2

11 ≠ e≠2‘·

22

+µ

1µ + ‘b2c

2

2

1 ≠ 2‘b2

11 ≠ e≠4‘·

2≠ 2µ2

11 ≠ e≠2‘·

2

=S

U2µ

1µ + ‘b2c

2

2

1 ≠ 2‘b2

≠ 2µ2

T

V11 ≠ e≠2‘·

2

=2µ‘b2

1c2

+ 2µ2

1 ≠ 2‘b2

11 ≠ e≠2‘·

2


2.8.9 Variogram for ⁄t1 in the case of a flat estimator

We here find the variogram for process of the form

d⁄t1

= ‘Ë(⁄

1

≠ ⁄t1

)dt +Ô

2⁄1

dB1,t +

Ô2⁄t

1

dB2,t

È

using the same approach as in Appendix 2.8.1. Hence start by noticing that

⁄t1

= ⁄0

1

+ ‘⁄1

t ≠⁄ t

0

⁄s1

ds +Ô

2‘3

⁄1

B1,t +

⁄ t

0

⁄s1

dB2,s

4,

⁄t1

= ⁄0

1

+ ‘⁄1

t ≠⁄ t

0

⁄s1

ds.

Which gives us that ⁄t1

= ⁄0

1

e≠‘t + ⁄1

(1 ≠ e≠‘t). Next use this and Ito’s formula toget that the second moment of ⁄t

1

satisfies

(⁄t1

)2 =1⁄0

1

22

+ 2‘⁄1

⁄ t

0

⁄s1

ds + 2‘2(⁄1

)2t ≠ 2‘(1 ≠ ‘)⁄ t

0

(⁄s1

)2ds

=1⁄0

1

22

+ 2‘⁄1

⁄ t

0

Ë⁄0

1

e≠‘s + ⁄1

11 ≠ e≠‘s

2Èds + 2‘2(⁄

1

)2t ≠ 2‘(1 ≠ ‘)⁄ t

0

(⁄s1

)2ds

=1⁄0

1

22

+ 2⁄1

1⁄0

1

≠ ⁄1

2 11 ≠ e≠‘t

2+ 2‘(1 + ‘)(⁄

1

)2t ≠ 2‘(1 ≠ ‘)⁄ t

0

(⁄s1

)2ds.

Solving this ODE we get that

(⁄t1

)2 = (⁄0

1

)2e≠2‘(1≠‘)t + 21 ≠ 2‘

⁄1

(⁄0

1

≠ ⁄1

)Ëe≠‘t ≠ e≠2‘(1≠‘)t

È

+ (⁄1

)2

1 + ‘

1 ≠ ‘

Ë1 ≠ e≠2‘(1≠‘)t

È.

As for the autocovariance term we get for any · > 0 that

⁄t1

⁄t+·1

= (⁄t1

)2 + ·‘⁄1

⁄t1

≠ Ÿ⁄ ·

0

⁄t1

⁄t+s1

ds.

By again treating this as ODE for any fixed value of · and conditioning on ‡-algebraat t we get that ⁄t

1

⁄t+·1

= (⁄t1

)2e≠‘· +⁄1

⁄t1

(1 ≠ e≠‘· ). We thus get that the variogram


vt(·) for a fixed t is

vt(·) = (⁄t+·1

≠ ⁄t1

)2 = (⁄t+·1

)2 + (⁄t1

)2 ≠ 2⁄t1

⁄t+·1

= (⁄t1

)2e≠2‘(1≠‘)· + 21 ≠ 2‘

⁄1

1⁄t

1

≠ ⁄1

2 Ëe≠‘· ≠ e≠2‘(1≠‘)·

È

+ (⁄1

)2

1 + ‘

1 ≠ ‘

Ë1 ≠ e≠2‘(1≠‘)·

È+ (⁄t

1

)2 ≠ 2(⁄t1

)2e≠‘· ≠ 2⁄1

⁄t1

11 ≠ e≠‘·

2.

We can now use that ⁄t1

= ⁄1

and (⁄t1

)2 = (⁄1

)2

1+‘1≠‘

to get that the variogram is

v(·) = (⁄t+·1

)2 + (⁄t1

)2 ≠ 2⁄t1

⁄t+·1

= 2(⁄1

)2

1 + ‘

1 ≠ ‘≠ 2(⁄

1

)2

1 + ‘

1 ≠ ‘e≠‘· ≠ 2(⁄

1

)2

11 ≠ e≠‘·

2

= 4‘(⁄1

)2

1 ≠ ‘

11 ≠ e≠‘·

2.

Chapter 3

Optimal non-linear shrinkage for alarge class of models andestimators

3.1 Literature reviewThe problem of estimating covariance matrices comes up naturally in a large numberof applications, such as finance, signal processing, climatology and biostatistics. Intraditional asymptotic statistics one works under the assumptions that the numberof variables, n, is fixed, but the number of samples, N , goes to infinity. Undernormal assumptions, the standard sample covariance matrix then converges to thetrue covariance matrix. In order for these conclusions to be approximately true forany finite dimensional sample size, it is thus necessary that N ∫ n. In a lot of cases,the number of variables is unfortunately proportional to the number of samples. Asan example, assume that one wishes to estimate the covariance matrix of the returnsfrom the stocks listed on the S&P 500 index using 2 years of data. This correspondsto having n/N ¥ 1. Thus the more interesting and realistic assumption is to let n

and N go to infinity jointly, under the constraint that n/N converge to some fixedratio c. Spectral properties of matrices under this type of asymptotics is a well-studied subject that first came from studying energy levels in nuclear physics. The

76

CHAPTER 3. NON-LINEAR SHRINKAGE ESTIMATION 77

first result for covariance matrices is [39] which showed that the empirical spectraldistribution (e.s.d.) of the sample covariance matrix, i.e. the distribution of itseigenvalues, converge to the now so-called Marchenko-Pastur distribution if all thepopulation eigenvalues are all equal to 1. Since then there has been an explosionof research within this area, both theoretically and also applied to various statisticalproblems. The majority of these papers focus on various distributional properties thatthe eigenvalues or, in a few cases, the eigenvectors have. One of the main conclusionfrom all of these is that the standard sample covariance is a very ine�cient estimatorwithin this regime. However the theory could up until recently not just only partlyquantify this e�ect, but also it had no way to fix it. Hence most research in estimationhas instead leveraged model specific properties such as sparseness, factor structuresor used extra assumptions leading to regularization estimators [4, 21,51].

A di�erent approach that dates far back is to narrow down the class of estimatorsonto the ones which are rotationally-equivariant with respect to the sample covariance.This corresponds to the logic that if we rotate the data, then the estimator should berotated correspondingly. By then using an expected loss framework, one gets for mostcommon loss functions that the optimal estimator has the same set of eigenvectors asthe sample covariance, but with the eigenvalues replaced. The formula to replace theeigenvalues is sometimes referred to as the shrinkage formula. This type of estimatorswere first introduced by Stein in [56, 57] and have since then been used essentially,see e.g. [14,24,32,61]. As pointed out in [35] and papers referred to therein, the Steinestimator is still considered the ’Gold standard’ that’s hard to beat empirically.However as they also point out, it has several theoretical flaws such as:

• It does not directly minimize the risk, but instead an unbiased estimator of it.

• The process may end up with an estimator that is not semi-positive definite,and hence it is not a valid covariance matrix.

• It requires normality.

• It is only defined when the sample size exceeds the dimension.


The story behind the solution to these problems began in [31], where a very inno-vative application of large sample eigenvector asymptotics was proposed to retrievethe optimal rotational-equivariant estimator. Even though their results are only validwhen the sample size and number of variables are infinite, they show empirically thatthe results work well when having as little as only 10 variables. They also show howthe same idea can be used to find an optimal estimator of the precision matrix, i.e.the inverse of the covariance matrix. All the estimators from [31] were however Or-cale estimators, i.e. they depend directly on an unobservable quantity. More exactlythey required the knowledge of the population e.s.d. of the covariance matrix. Tomake these estimators Bona-fide, i.e. only data dependent, the subsequent papers [33]and [35] found an unbiased estimator for the e.s.d of the population covariance matrix.A similar idea had previously been studied in [18], however in [33] they claim thatthis idea could not be adapted to the current situation. In [33] it was shown theoret-ically that the full Bona-fide estimator will be optimal in the asymptotic limit. Theyalso showed empirically that for finite sample sizes, this estimator will perform closeto the optimal estimator. All of these articles focused on minimizing the expectedrisk under a Frobenius loss, which di�ers from the loss function introduced by [57].In [35] the conclusions of [31] is also re-derved by minimizing the asymptotic riskfunction, instead of first minimizing the finite sample loss function and then studyingits asymptotics. This idea allows them to also study other loss functions beyond theFrobenius loss, such as Stein’s loss. They also show that there is, in the asymptoticlimit, a remarkably close connection between the Frobenius- and Stein-loss. Namelythe estimator of the covariance obtained under Stein’s loss corresponds exactly to theinverse of the estimator the precision matrix under Frobenius loss. Similar ideas havealso been applied to the spiked covariance model in [15], where they show some simi-lar results under a large collection of di�erent loss functions including the previouslymentioned.

So far all the covariance estimators we have mentioned are of the formSn = N≠1A1/2

n XnXúnA1/2

n , where Xn is a N ◊ n matrix with i.d.d. entries andAn is the population covariance matrix, as they were required to be rotational-equivariant. A more general matrix model related to covariance matrices is Sn =


N≠1A1/2

n XnBnXúnA1/2

n , where Bn is a Hermitian matrix. In case Bn is diagonal, de-terministic and with a trace equal to 1, then this is a regular weighted covarianceestimator. There are also applications in which Bn is random and non-diagonal. Thefirst one to show the full generalization of the Marchenko-Pastur to these type ofmatrices was [62], and since more papers have shown other properties of these typeof matrices (e.g. [10,11,13,46]). In [19] a similar model was studied that was tailoredfor elliptical distribution, and thus it only allowed diagonal Bn, but it had a muchhigher distributional flexibility on Xn. What some of these papers discuss is how thetheories of [62] can be applied to robust M-estimators. This is a class of estimatorsused commonly for elliptically distributed data (see e.g. [40]). Here [13] investigatedthe distributional properties of this type estimators and [8, 12] showed how the ideaof linear shrinkage (i.e. taking a convex combination of the estimator and a scaledidentity matrix) can be applied to there.

3.2 On general large dimensional matricesLet Xn for n = 1, 2, . . . , be a n ◊ N matrix consisting of complex i.i.d variables withmean 0 and variance 1. Also let An be an Hermitian matrix with A1/2

n being anysquare root matrix of it, let also and Bn be any Hermitian matrix. The matrix ofinterest in this chapter is the Hermitian matrix

Sn = N≠1A1/2

n XnBnXúnA1/2

n . (3.1)

In the case where Bn = IN , Sn can be interpreted as the sample covariance fromN samples of A1/2

n X•,1 (where X•,1 denotes the first column of Xn), which has thepopulation covariance matrix An. As we will see next, the case when Bn ”= IN o�erssome interesting applications. We can also let Bn be a random matrix, which opensup the model for even more applications. Our goal will be to eventually be able toestimate An when both n æ Œ and N æ Œ and they satisfy n/N æ c œ (0, Œ). Asthis requires some results from random matrix theory, the latter part of this sectionwill give a background to all the necessary results. Before doing this we will look at


some applications for which this model can be used.

3.2.1 Motivation

Our model (3.1) can as mentioned be interpreted di�erently depending on what theproperties of Bn are. Here will see three di�erent cases; one where Bn is determin-istic and diagonal, one where it is random and diagonal and at last one where it isdeterministic and Hermitian. We will also come back to these examples in Section3.4 to show how our theory is applicable to these examples and also to show someempirical results.

Weighted covariance estimation

The most trivial example of where (3.1) is applicable is if we wish to estimate An

from data A1/2X•,1, . . . , A1/2X•,N and also have some external independent knowledgeregarding the di�erent samples. This external information could for example be thequality of each sample or how long ago it were taken. This type of estimators isfor example common to use for financial data when estimating the current datescovariance matrix from historical data, as in Chapter 2. One there motivate theseestimators by the fact that if there have been changes in the population covariancematrix, then the most recent data will be the most accurate.

To adapt our model to these estimators let bi be the weight of the ith sample,with the condition that qN

i=1

bi = 1. This is not a necessary condition but it impliesasymptotic unbiasedness. By defining Bn = Diag(N · b

1

, . . . , N · bN), the weighedcovariance estimator covariance can be written as

Sn =Nÿ

i=1

biA1/2

n X•,iXú•,iA

1/2

n = N≠1A1/2

n XnBnXúnA1/2

n .

One commonly used weighted covariance estimator is the so-called exponential weightedmoving average (EWMA) estimators. This is defined by a parameter – œ (0, 1) andthe recursive equation Bi+1,i+1

= N(1 ≠ –)Bi,i, B1,1 = N .


Regular and robust covariance estimation for elliptic data

A random vector is said to have from an elliptical distribution if it has the form

yi = µ + ’i�xi (3.2)

where µ œ Rn, ’i are identically distributed independent variables in R, xi’s are uni-form variables on any scaled unit sphere in Rm and has covariance matrix � œ Rm◊m

and � œ Rn◊m. By scaling the xi’s appropriately (i.e. by using the scaling whichis proportional to n) the distributional requirement of xi falls within our model, aswe can factor out the dependence of the covariance matrix. We can hence let xi

correspond to our previous mean 0, variance 1 variables also in this case. These arefurther discussed and applied to financial data in e.g. [23, 41]. Examples of distribu-tions that fall within this class are normal, Cauchy and t-distributions. Hence thismodel, compared to the standard case, allows us to look at distributions which haveheavy tails. In the context of financial data, it is a stylized empirical fact that returndata series on an instrument have heavy tails and failing to account to this may implya significant underestimation of risk [41].

Thus let Xn œ RN◊m for some m > 0, set An = ��ú and Bn = Diag(’2

1

, . . . , ’2

N).As the mean vector µ only contributes with a finite rank perturbation when we areestimating the covariance, it will not change the limiting e.s.d. and thus we willassume w.l.o.g. that µ = 0. Hence the sample covariance of the elliptical data is bydefinition

Sn =Nÿ

i=1

yiyúi = N≠1A1/2

n XnBnXúnA1/2

n ,

where A1/2

n = �1/2�. Notice that A1/2

n is no longer a square matrix, however this willnot cause any problems as we will see. To further simplify the estimation process wewill assume that E [’i] = 1, so that the covariance matrix of yi is simply An. Sincethe expectation of ’i is a nuisance parameter(which can be seen by scaling � andthen scaling ’i inversely), this constraint does not make us lose any generality. Thisconstraint also gives us the benefit of being able to interpret the sample covariance


as a weighted estimator of the variance, where the weight is simply ’i.As there is no correlation between ’i and the information we are after(An), this

means that the ’’s are only going to put random weights on the sample. As ’ mighthave a high variance, this means that some samples might account for most of theweight. The e�ect of ’ is hence to increase the risk of the estimator as the optimal casewould be when an equal weight is placed on all samples. Because of this phenomenonthere has been a lot of e�ort into developing e�ective estimators which approximatelydo this. We will look more closely at one of the most commonly used one of these,the so-called Maronna’s M-estimator [40]. This is defined as the solution, An, to theequation

An = N≠1

Nÿ

i=1

u3 1

nyú

i A≠1

n yi

4yiy

úi , (3.3)

where u : [0, Œ) æ [0, Œ) is nonnegative, continuous and non-increasing. As anexample, we can choose u(x) = 1+–

–+xfor some fix – > 0. In the case where – = 0 this

is the so-called Tyler’s M-estimator [58], however will not look further at this case.The interesting aspect of this estimator is that when n/N æ c and n/m æ d œ (0, Œ)as n æ Œ, the weights u

11

nyú

i A≠1

n yi

2will only depend on ’i. Hence by doing a change

of variables, (3.3) can be written as

An = N≠1A1/2

n XnBnXúnA1/2

n .

in the limit, and thus it falls under our model.

Spatio-temporal data

Another application of our results is for so-called spatio-temporal data. The applica-tion of asymptotic models to this area has been hinted earlier in [46]. Here we canthink of having one sample of size Nn ◊ 1, with a covariance matrix �n = An ¢ Bn.This model can be interpreted as a sample from an image where An denotes the co-variance between the pixels on the same longitudes and Bn denotes the covarianceamong the pixels at the same latitude. The main assumptions of this model is clearly


that the covariance among each latitude remains fixed while the longitude is varyingand vice versa. These type of models are for example used in Climatology [59]. Thismodel can also be though of as having data coming from a linear time series [37, 47]so that An denotes the covariance at each point in time and Bn contains informationabout the autocovariance.

By splitting up this one sample into a matrix X, the sample estimator of An willsatisfy

An = Tr(Bn)N

N≠1A1/2

n XnBnXúnA1/2

n .

Similarly by symmetry the sample estimator of Bn will satisfy

Bn = Tr(An)n

n≠1B1/2

n XúnAnXnB1/2

n .

3.2.2 The Marchenko-Pastur theorem and generalizations

Motivated by the examples from the previous section, we will now formalize theassumptions that we will use throughout the remaining part of this chapter.

Assumption 3.2.1. Let the following hold:

• The variables n and N go to infinity simultaneously in the sense that the limitc = limnæŒ n/N œ (0, Œ) exists.

• Xn œ Rn◊N consist of iid variables with mean 0, variance 1 which also satisfythe Lindeberg type condition

1nN”2

nÿ

i=1

Nÿ

j=1

EË|Xi,j|2 1

(”Ô

N,Œ)

(|Xi,j|)È

æ 0.

for each ” > 0 as n æ Œ.

• An is a n ◊ n random (or deterministic) positive definite Hermitian matrixindependent of Xn and Bn.


• Bn is a N ◊ N random (or deterministic) positive definite Hermitian matrixindependent of Xn and An.

• If we denote the eigenvalues of An by ·A,1, . . . , ·A,n with ·A,1 Ø ·A,2 Ø · · · Ø ·A,n,then HAn(·A) = 1

n

qni=1

1[·A,j ,Œ]

(·) converges a.s. to a non-random limit HA atevery continuity point of HA. HA defines a probability distribution functionwhose support is included in the compact interval [hA,1, hA,2] where 0 < hA,1 ÆhA,2 < Œ.

• If we denote the eigenvalues of Bn by ·B,1, . . . , ·B,N with ·B,1 Ø ·B,2 Ø · · · Ø·B,N„ then HBn(·B) = 1

n

qni=1

1[·B,j ,Œ]

(·) converges a.s. to a non-random limitHB at every continuity point of HB. HB defines a probability distribution func-tion whose support is included in the interval [hB

1

, Œ) where 0 < hB,1 < Œ.

In order to study our version of the elliptical model (see Section 3.2.1) we onlyhave to make the extra assumption that

d := limnæŒ

n/m œ (0, Œ).

To get the distribution of Sn under this model we can directly follow the ideas of [31]who studies d ◊ Sn. Hence by a renormalization we can obtain the law of Sn.

Before continuing let us denote the sample eigenvalues of Sn by (⁄1

, . . . , ⁄n) with⁄

1

Ø ⁄2

Ø · · · Ø ⁄n, and the corresponding sample eigenvectors by (u1

, . . . , un).Similarly let v

1

, . . . , vn denote the population eigenvectors of An. Also define thee.s.d. of Sn by Fn(⁄) = n≠1

qni=1

1[⁄i,Œ)

(⁄). Recall that the Stieltjes transform of anydistribution function G is defined as [3]

mG(z) =⁄

R

1⁄ ≠ z

dG(⁄), z œ C+.

Hence if A œ Cn◊n is a Hermitian matrix with spectral distribution HA and eigenval-ues ⁄

1

, . . . , ⁄n, its Stieltjes transform is

mHA(z) = n≠1

nÿ

i=1

1⁄i ≠ z

= n≠1 TrË(A ≠ zI)≠1

È.


The Stieltjes transform of SN thus simply reads mFn(z) = Tr [(Sn ≠ zI)≠1]. Recallnext that the Marchenko-Pastur theorem [39] tell us how to find mFn under as-sumptions similar to Assumptions 3.2.1 with the additional constraint that Bn = In.Since then the limiting distributions and its properties have also been studied forother covariance models. Most noteworthy for our purpose is [62], who extended theMarchenko-Pastur theorem to the case of Bn being any Hermitian matrix. The maindi�erence compared to the case of Bn = In which Marchenko-Pastur studied, is thatmFn(z) now is coupled to the quantity

eFn(z) := TrË(Sn ≠ zIn)≠1An

È.

By using the notation from [46], the main conclusion of [62] is that under Assumption3.2.1 it follows that mFn(z) a.s.≠≠æ mF (z) ’z œ C+,eFn(z) a.s.≠≠æ eF (z) ’z œ C+ and

mF (z) =⁄

R

1a

sR

b1+bceF (z)


eF (z) =⁄

R

a

as

Rb

1+bceF (z)

dHB(b) ≠ zdHA(a), ’z œ C+.

(3.4)

Also Fn(⁄) a.s.≠≠æ F (⁄) at all continuity points of F . Thus there exists some limitingdensity F for the eigenvalues of Sn. In order to calculate this, we start by notingthat [11] shows the existence of the limit

lim÷æ0

mF (⁄ + i÷) æ mF (⁄), ’⁄ œ

Y]

[R If c œ (0, 1)R \ 0 If c > 1

. (3.5)

and that F has a continuous derivative F Õ(⁄) = fi≠1Im [mF (⁄)]. This thus givesus a way to find the limiting density from the Stieltjes transform. Next define thematrix

¯Sn = N≠1B1/2

n XnAnXúnB1/2

n and denote its e.s.d. by¯Fn. Then it follows (as

in e.g. [62]) that¯Fn = (1 ≠ c)1

[0,+Œ)

+ cFn. This is distribution is useful in the casewhere c > 1 since m

¯

F (⁄) is then defined ’⁄ œ R, whereas mF (⁄) in not defined at


⁄ = 0. As a generalization of mFn(z) and eFn(z), define the collection of functions

◊(k)

n (z) = n≠1 TrË(Sn ≠ zIn)≠1(z)Ak

n

È, ’z œ C+ and k = 0, 1, . . . (3.6)

Note here that ◊(0)

n (z) = mFn(z) and ◊(1)

n (z) = eFn(z). This object was to our knowl-edge first studied in [62], were it was a byproduct in proving Theorem 4.1.1. Moreexactly they show that under Assumption 3.2.1, ◊

(k)

N (z) a.s.≠≠æ ◊(k)(z) and ◊(k)(z) satis-fies

◊(k)(z) =⁄

R

ak

as

Rb

1+bceF (z)

dHB(b) ≠ zdHA(a), ’z œ C+. (3.7)

These functions were also mentioned, but not considered further in [19]. The firsttime these found usage was in [31], who however did not use any of the previouslydeveloped theory when studying these functions. Under the constraint that Bn = In,they extended these objects to a more general class of functionals

◊gn(z) = n≠1 Tr

Ë(Sn ≠ zIn)≠1g(An)

È, (3.8)

where g is any bounded and continuous function on [hA,1, hA,2] with finitely manydiscontinuities. Our goal now is to find a representation of ◊g(z) similar to (3.6)under Assumption 3.2.1. Recall first that in [31] they show that under Assumption3.2.1 (with the Lindeberg condition exchanged for a bounded 12th moment) and theconstraint Bn = I, ◊g

N(z) a.s.≠≠æ ◊g(z) and ◊g(z) satisfies

◊g(z) =⁄

R

g(a)a[1 ≠ c ≠ czmF (z)] ≠ z

dHA(a). (3.9)

Their proof of this can partly be seen as a simplified version of the proof of (3.7)from [62]. We are now ready to state our first theorem, which generalizes all of theseprevious results, and is also a main ingredient into proving the remaining theoremsof this chapter.

Theorem 3.2.1. Under Assumption 3.2.1, we have ’z œ C+ that ◊gN(z) a.s.≠≠æ ◊g(z)

if g is a bounded and continuous function on [hA,1, hA,2] with finitely many points of


discontinuities and ◊g(z)

◊g(z) =⁄

R

g(a)a

sR

b1+bceF (z)

dHB(b) ≠ zdHA(a), (3.10)

and similarly if the data is elliptic.

The proof of this is deferred to Section 3.6.1. In the next section 3.10 we will seetwo possible applications of (3.10), where one of them could have been shown withonly (3.7). Notice that other applications are possible as shown in [31].

3.3 Asymptotically optimal bias correction for theeigenvectors

Let us now start looking into the problem of estimating a covariance matrix from ourmodel. The first step is to determine exactly what the covariance is, as we have bothAn and Bn in our model. To this end we note that the expectation of Sn is

E [Sn] = N≠1A1/2

n E [XnBnXún] A1/2

n = N≠1A1/2

n EËXn�B,nXú

n

ÈA1/2

n = Tr(Bn)N

An

where Xn = XnVB,n and VB,n is the matrix containing the eigenvectors of Bn. Alsodefine the quantity —n = Tr(Bn)

N. Hence our interest is to find —n ◊ An, so to simplify

the problem we can assume that —n = 1 w.l.o.g. since we can simply scale An by —n

if —n ”= 1 . Thus our inferential goal is simply to estimate An.Assume now that we have any covariance estimator of the form (3.1) and would

like to find a better estimator. One way to do this is to restrict the search scope toestimators that have the same eigenvectors as our original estimator (3.1), which wewill hereafter refer to as the base estimator. This is thus the corresponding as to lookwithin the class of rotationally-equivariant for estimators with B = I. A way to thinkabout this, when B comes from a weighted estimator, is that this is first chosen in(3.1) to get estimates of the eigenvectors. We then use the theory here (which will beindependent of the estimated eigenvalues asymptotically) to find the corresponding


eigenvalues to use for each eigenvector. If we first let

Sn = UnDnUún, D = Diag(⁄

1

, . . . , ⁄n), (3.11)

where Un is the matrix containing the eigenvectors of Sn, then any estimator withinthis described class can be written as

Sn = UnDnUún, D = Diag(⁄

1

, . . . , ⁄n). (3.12)

This type of estimators is sometimes also referred to as shrinkage estimators as theire�ect typically is to shrink the sample eigenvalues towards some intermediate value.This is exactly the type of problem that was being studied in [56] under the so-calledStein’s loss, which can be seen as the Kullback-Leibler distance between the trueand the estimated covariance matrix under a Gaussian model. We will here insteadfollow [32] and [33] and study the Frobenius loss, i.e. L(An, An) = ÎAn ≠ AnÎF withÎAnÎF =

ÒTr AnAú

n, which is simply the squared element-wise distance between thetrue matrix and the estimator. The reason for this is mainly due to the simplicityof finding the optimal estimator within our class for this loss, and that it is simpleto understand. By using other methods than the ones considered in this thesis, onecan as in [34] study Stein’s loss instead. As mentioned, in all previous cases onehas looked at rotationally-equivariant estimators as thus enforcing Bn = In, howeverthere is no underlying principle constraining us to look at the case when this is nottrue. The corresponding subclass does not however have any good interpretation.

For the Frobenius loss, the optimal estimator of An within the subclass of estima-tors having the same eigenvectors as Sn takes the form [31]

di = uúi Anui, ’i = 1, . . . , n. (3.13)

Now (3.13) tells us how to find the optimal estimator, but it is an Oracle estimatormeaning that in order to calculate the estimator we need the knowledge of a quantitythat is dependent on the unknown covariance. The first step in the remedy to this isto consider the asymptotic behavior of this quantity. To do this we follow [31] and


define the seemingly related quantity

�n(x) = 1n

nÿ

i=1

di1[⁄i,Œ)

(x) = 1n

nÿ

i=1

uúi Anui ◊ 1

[⁄i,Œ)

(x).

From this we can deduce the value of di as

di = lim‘æ0

�n(⁄i + ‘) ≠ �n(⁄i ≠ ‘)Fn(⁄i + ‘) ≠ Fn(⁄i ≠ ‘) , ’i = 1, . . . , n. (3.14)

By using Theorem 3.2.1 with g(·) = · we get the asymptotic behavior of �n is

�n(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(1)

n (› + i÷)È

d›,

and as shown in Section 3.6.2 we can prove the following theorem.

Theorem 3.3.1. Assume that Assumption 3.2.1 holds. Then �n(x) a.s.≠≠æ �(x) forall x œ R \ 0. Further if c ”= 1 we have that �(x) =

s x≠Œ ”(⁄)dF (⁄), where

”(⁄) =

Y___]

___[

Im [eF (⁄)] /Im [mF (⁄)] If ⁄ > 0c

1≠ceF (0) If ⁄ = 0 and c > 1

0 Otherwise(3.15)

where

eF (0) := lim‘æ0

+

lim÷æ0

+

(› + i÷) ◊ eF (› + i÷),

and similarly in the elliptic case.

We can now use Theorem 3.3.1 together with (3.14) to get that the optimal esti-mator having the same eigenvectors as (3.1) is asymptotically

di = ”(⁄i).

What this means is that if we have any covariance of the form (3.1), then the matrixwhich minimizes the Frobenius loss asymptotically and also satisfies (3.11) can be


rewritten as (3.12) with di as above. Note again that it is still only an Oracle estima-tor, since it depends on the unknown Stieltjes transforms. In the case where Bn = IN ,this dependence were overcame in [33, 35] by replacing the Stieltjes transform by anunbiased estimator to it. This is something we are working on for a subsequent paper.

We will now do the same analysis for estimating the precision matrix, i.e. A≠1

n .This matrix is commonly used in practice and a lot of the ideas regarding shrinkageand regularization methods were originally motivated by this problem. The reasonfor this is that if we would estimate A≠1

n as the inverse of the sample covariancematrix, the estimator would have a very high risk as the sample covariance mightbe ill-conditioned. The covariance estimator will however be an estimator whoseeigenvectors define the subclass we are optimizing over. So again any estimatorwithin this similarity class can be written as (3.12). The optimal di in this case issimply

di = uúi A

≠1

n un

where A≠1

n is the Moore-Penrose pseudoinverse. By following the covariance case,define

�n(x) := 1n

nÿ

i=1

uúi A

≠1

n ui ◊ 1[⁄i,Œ)

(x)

Then we can also in this case see that asymptotically di can be written as

di = lim‘æ0

�n(⁄i + ‘) ≠ �n(⁄i ≠ ‘)Fn(⁄i + ‘) ≠ Fn(⁄i ≠ ‘) , ’i = 1, . . . , n.

By using using Theorem 3.2.1 we get that the asymptotic behavior of �n is

�n(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(≠1)

n (› + i÷)È

d›, (3.16)

and in Section 3.6.2 we have showed the following theorem

Theorem 3.3.2. Assume that Assumption 3.2.1 holds. Then �n(x) a.s.≠≠æ �(x) for all


x œ R \ 0. When c ”= 1 we have that �(x) =s x

≠Œ Â(⁄)dF (⁄), where

�(⁄) =

Y_____]

_____[

1

⁄

Im

ËmF (⁄)

[

⁄mF (⁄)+1

]

eF (⁄)

È

Im[mF (⁄)]

If ⁄ > 0c≠1

eF (0)

+ cmHA (0)

c≠1


(3.17)

and similarly for the elliptic case.

Thus by setting di = Â(⁄i), UnDUún will converge to the optimal estimator of the

precision matrix within the subclass of matrices having the same eigenvectors as thesample covariance matrix.

3.4 ApplicationsWe will now return to the examples discussed in Section 3.2.1 and see how Theorems3.3.1 and 3.3.2 are applicable to these. More exactly we will see how well the Oracleestimators found in previous sections can be applied to these problems and how wellthey perform for finite samples sizes in these contexts. Since we are looking at theOracle estimators, we implicitly assume that we have a full knowledge of the limitinge.s.d of An and BN .

3.4.1 Robust estimators for elliptical data

Recall from Section 3.2.1 that Maronna’s M-estimator is defined as the solution tothe equation

An = N≠1

Nÿ

i=1

u3 1

nyú

i A≠1

n yi

4yiy

úi , (3.18)

where u : [0, Œ) æ [0, Œ) is nonnegative, continuous and non-increasing and yi comesfrom aa elliptic distribution of the form (3.2). Following [13] we define the functions

• „(x) = x ◊ u(x)


• g(x) = x1≠c„(x)

• v(x) = [u ¶ g≠1] (x)

• ‰(x) = x ◊ v(x)

Let us henceforth specifically use u(x) = 1+––+x

for some fix – > 0. It was shown in [13]that under similar assumptions to Assumption 3.2.1, with the additional assumptionsthat c < 1 and d < 1, that the estimator which is the solution to (3.18) converges to

An = N≠1

Nÿ

i=1

v1’2

i “Œ2

yiyúi = N≠1

Nÿ

i=1

v1’2

i “Œ2

’2

i �xixúi �ú, (3.19)

where “Œ satisfies the equation

1 =⁄

R

‰(b“Œ)1 + c‰(b“Œ)dHB(b).

Notice that by defining Ÿi = v (’2

i “Œ)1/2

’i, we can see that (3.19) can be thoughtof as the sample covariance estimator of an elliptic model where we have replaced ’

by Ÿ. Maronna’s estimator can thus be thought of as the process of finding an newelliptical model where the variance of the non-standard variables has been reduced.

Next note that the Ÿi’s are asymptotically i.d.d. variables and thus Theorem 3.3.1is directly applicable to this situation, i.e. we can find the optimal estimator that hasthe same eigenvectors as Maronna’s estimator. In order to apply it to a sample of{yi}N

i=1

, we however need to know the limiting distribution function of Ÿ in additionto the limiting e.s.d. of A. We will thus assume in our numerical simulations thatwe have perfect knowledge of these, as we are only looking at the properties of theOrcale estimator for finite sample sizes.

We will also study the optimal shrinkage estimator within the class having sameeigenvectors as sample covariance and its corresponding precision matrix. In boththese cases, we will use the percentage relative improvement in average loss (PRIAL)


as a metric [32], which we define as

PRIAL(M) = 100 ◊Q

a1 ≠E

ËÎM ≠ UnDnUú

NÎ2

F

È

EËÎREF ≠ UnDnUú

NÎ2

F

È

R

b ,

where REF is the base estimator (i.e. either the sample covariance or Maronna’sestimator) and UnDnUú

N is the optimal shrinkage for the corresponding base estimator.By definition we thus have that PRIAL(REF ) = 0 and PRIAL(UnDnUú

N) = 1.Hence PRIAL is a normalized measure of how close we are to the matrix whichminimizes the average Frobenius loss within the similarity class. We know fromTheorem 3.3.1 that as n æ Œ the PRIAL measure of our Orcale estimator willconverge to 1, the question is how it behaves for finite sample sizes.

We will here let the ’i’s come from a non-central t-distribution with non-centralityparameter 1 and 2 degrees of freedom, which we have scaled such to have a mean of1. Xi follows a uniform distribution on the scaled n ≠ 1 dimensional sphere of radiusN and � is a fixed matrix whose entries was generated from standard Gaussians. Wehave fixed c = 0.4, – = 0.1 and d = 0.66. In Figure 3.1 we have plotted these twoPRIAL’s for both the sample and Maronnna’s estimator. We can see that even forsmall sample sizes, the shrinking formula is very close to the optimal solution withinthe similarity class. As for the sample estimator, its PRIAL is always almost 1. Thereason for this is that the sample estimator has, as previously explained, some veryill-behaved eigenvalues. Hence there is a lot of room for improvement which means itis fairly easy to get a high PRIAL. As for Maronna’s estimator, it already has fairlywell-behaved eigenvalues and there is thus less room for improvement.

If we look at the actual improvement in average risk, the shrunken sample esti-mator also has a risk of about 30% of the non-shrunken version of Maronna’s esti-mator. However the best estimator is the estimator obtained when using shrinkageon Maronna’s estimator and this has about 60% less risk than the shrunken sampleestimator. This interpretation of this it that the benefit of using Maronna’s estima-tor within our context is that it finds eigenvectors which are closer to the populationeigenvectors. The shrunken sample estimator however yields a superior performance


Figure 3.1: Plot of the PRIAL of the shrinkage estimator retrieved by using (3.15)for both the sample estimator and Maronna’s estimator. Each point on the curves isgenerated by using 200 simulations from the same model. Each of model is generatedby setting � equal to a matrix with standard Gaussian entries.

over the vanilla Maronna estimator.Likewise we can also do the same process to estimate the precision matrix. We

can also compare this shrinkage estimator to the estimator obtained by finding theinverse of the shrinkage covariance estimator.

In Figure 3.2 we have plotted the results of trying to estimate the precision matrix.We can see here that by using (3.17) instead of simply taking the inverse of (3.15)one obtains a higher PRIAL. Note that in this situation the di�erence in average riskbetween the base estimator and the optimal estimator is about a factor 4 on average,whereas in the covariance case it was about a factor 2. Hence the improvement ofthe shrinkage estimator over the non-shrunken estimator in this case is substantiallylarger.

3.4.2 Spatio-temporal data

Using the model discussed in Section 3.2.1, our goal here is to find an estimator ofAn ¢Bn. Also in this case we have to assume that the limiting e.s.d. of An and Bn areknown. If we define –n = Tr(An)

nand keep our assumption that —n = 1, we get from

the symmetry of the problem that E [c≠1

¯Sn] = –n ◊ Bn and as before E [Sn] = An.


Figure 3.2: Plot of the PRIAL of the shrinkage estimators retrieved by using (3.17) forboth the sample estimator (Left) and Maronna’s estimator (Right). We also comparethis to the matrix obtained by taking the inverse of the shrunken covariance matrixfor the corresponding base matrix. Each point on the curves is generated by using200 simulations from the same model. Each model is generated by setting � equal toa matrix with standard Gaussian entries.

We can thus apply Theorem 3.3.1 to both Sn and¯Sn separately and then put them

together into one estimator. Our estimator thus takes the form

\An ¢ Bn = 1c–n

[Sn ¢¯Sn] . (3.20)

Here –n comes from our assumption that we can unbiasedly estimate the spectrumof An and Bn as in [35], and thus can estimate these quantities with low variance.Obviously we can no longer conclude any kind of optimality of the estimator oversome similarity class, as we are simply taking the Kronecker product of two estimatorswhich are optimal for their individual estimation problems.

Notice that there exist estimators adapted for this type of problems [44], howeverit is outside the scope of this study to discuss these and adopt our methodology tothese. We are simply interested to see that the framework is usable in this situation.A numerical problem now is that we cannot calculate the optimal shrinkage estimatorusing any standard script language, as for any reasonably chosen n the dimension ofAn ¢ Bn will be too large to handle, and more so to find the eigendecomposition for.


Hence we will look at the PRIAL metric of A and B separately. In Figure 3.3 we showthe result of this, if both An and Bn are generated from the product of rectangularGaussian matrices with twice amount of columns as rows and the Xi,j’s are standardGaussians. We can see here that each of of these separately converges to the optimalvalue in the equivalence class.

Figure 3.3: Plot of the PRIAL of the shrinkage estimators for An(left) and Bn(right)when using the sample estimator as the base estimator. Each point on the curves isgenerated by using 200 simulations from the same model. The population A and Bare generated in the same way as � in the previous section.

3.5 ConclusionsWe have extended the results of [31] to apply to a larger class of models. Our hope isthat this chapter of the thesis, just as [31], finds a lot of practical applications as themodels have a high flexibility as shown. We have here shown how the result can beused both for robust covariance estimators in elliptical situations and spatio-temporalmodels. When used for covariance matrices in elliptical models we have shown thatwhen our results are applied to the sample covariance, the final estimator has a loweraverage risk than some commonly used robust estimators. If our results are appliedto these estimators instead, we end up with an even lower average risk.

In order to use these estimators for practical applications, we first have to make


them Bona-fide. This is something we are working on in a subsequent paper using theideas of [18, 33, 35] to remove the Oracle quantities. We are also planning to extendthe approach to other loss functions than the Frobenius loss as in [34].

3.6 AppendixWe will in this appendix prove all the theorems of this chapter. In Section 3.6.1 weshow Theorem 3.2.1 and in Section 3.6.2 we show Theorem 3.3.1 and 3.3.2. The lasttwo are put in the same section as their proofs resemble each other.

3.6.1 Proof of Theorem 3.2.1

To prove Theorem 3.2.1 we will mimic the proof of Theorem 2 in [31]. For conveniencewe will also drop the indices of most n-dependent variables to avoid confusion. AsCorollary 4.6.3 of [62] already shows a stronger result than Lemma 2-4 of [31], we willstart by showing the following lemma.

Lemma 3.6.1. Theorem 3.2.1 holds for any function g that is continuous and boundedon [h

1

, h2

].

Proof of Lemma 3.6.1. The claim follows directly from Corollary 4.6.3 of [62] usedtogether with Weistrass approximation theorem.

By using Lemma 3.6.1 we can now prove Theorem 3.2.1 by using induction on thenumber of discontinuities of the function g.

Proof of Theorem 3.2.1. We already know from Lemma 3.6.1 that the claim holdsfor the case of 0 discontinuities. As we want to do induction on the number ofdiscontinuities, assume that the claim holds when we have k discontinuities. Next letg(x) be any function continuous ’x œ [hA,1, hA,2] having k + 1 discontinuity points,and let v be an arbitrary one of these points. Our goal is now to show that ourassumption implies that the claim holds for this case.


To do this construct the function fl(x) = g(x) ◊ (x ≠ v), which by definition hask discontinuities. By using the induction assumption we get that

◊fl(z) =⁄

R

fl(a)a

sR

b1+bce(z)

dHB(b) ≠ zdHA(a), ’z œ C+. (3.21)

We can then, just as [31], adapt Theorem 4.1.1 from [62] to show that ◊g(z) =limnæŒ ◊g

n(z) exists and satisfy

◊g(z) =◊fl(z) ≠

ËsR

b1+c◊(1)

(z)bdHB(b)

È≠1 sR g(a)dHA(a)

zËs

Rb

1+c◊(1)

(z)bdHB(b)

È≠1

≠ v, ’z œ C. (3.22)

By inserting the form of ◊fl(z) from (3.21) into (3.22) we get that

◊g(z) =

sR

g(a)

3≠v+z

Ësb

1+c◊(1)

(z)bdHB

(b)

È≠1

4

as

b1+bce(z)

dHB(b)≠z

dHA(a)

zËs

Rb

1+c◊(1)

(z)bdHB(b)

È≠1

≠ v(3.23)

=⁄

R

g(a)a

sR

b1+bce(z)

dHB(b) ≠ zdHA(a). (3.24)

Hence we have shown that the claim holds for any function with k+1 discontinuesand by induction we can conclude that the theorem holds under Assumption 3.2.1.

3.6.2 Proof of Theorem 3.3.1 & 3.3.2

To show Theorem 3.3.1 and 3.3.2 we start by showing a collection lemmas, which ofmost are modifications of lemmas from [31] to our assumptions.

Lemma 3.6.2 (Lemma 6 of [31]). Let g denote a (real-valued) bounded functionon [hA,1, hA,2] with finitely many points of discontinuity. Consider the function �g

n


defined by:

�gn(x) = n≠1

nÿ

i=1

1[⁄i,+Œ)

(x)nÿ

j=1

|uúi vj| ◊ g(·A,j)

Then there exists a nonrandom function �g defined on R such that �gn(x) a.s.≠≠æ

�g(x) at all points of continuity of �g. Furthermore

�g(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm [◊g(⁄ + i÷)] d⁄ (3.25)

for all x where ◊g is continuous.

Proof of Lemma 3.6.2 . Following [31] we note that �gn is by definition the Stieltjes

transform of ◊gn, so by Theorem 3.2.1 we know that there exists a nonrandom ◊g

such that ◊gn(z) a.s.≠≠æ ◊g(z) ’z œ C+. Now use Theorem 2.5 from [53], which is a

result on the convergence of Stilltjes transform and thus is independent of the exactclass of measure, to get that �g

n(x) a.s.≠≠æ �g(x) for all continuity points x of �g. Atlast equation (3.25) is simply a consequence of the inversion formula for Stieltjestransforms.

We next show that Lemma 7 of [31] also holds in this case

Lemma 3.6.3. Under the assumptions of Lemma 3.6.2, if c < 1 then for all (x1

, x2

) œR2:

�g(x2

) ≠ �g(x1

) = 1fi

⁄ x2

x1

lim÷æ0

+

Im [◊g(⁄ + i÷)] d⁄ (3.26)

If c > 1 then equation (3.26) holds for all (x1

, x2

) œ R2 such that x1

x2

> 0.

Proof of Lemma 3.6.3. We will first show that lim÷æ0

+ Im [◊g(x + i÷)] exists ’x œ R

when c < 1 and ’x œ R \ 0 when c > 1. By looking at (3.10) this is obvious if x œsupp(F ). If x ”œ supp(F ) we know from Proposition 3.1 in [11] that xs

b1+bceF (z)

dHB(b)

”œ

supp(HA). Hence by using (3.10) we get that this first claim follows.Next note from Theorem 3.1 in [11] shows that Theorem 2.1 from [54] holds for

this model, so we get that �g is continuous with derivative fi≠1Im [◊g(x)] ’x œ R


when c < 1 and ’x œ R \ 0. By integrating this we get the claim.

Lemma 3.6.4. The eigenvalues of Sn are a non-decreasing function of the eigenvaluesof A and B.

Proof of Lemma 3.6.4 . We will show the lemma by perturbing the eigenvalues of B.The same idea can be applied to the eigenvalues of A by looking at the eigenvaluesof c

¯Sn and use the connection between the eigenvalues of

¯Sn and Sn. Let us now

perturb the kth eigenvalue of B by adding any ‘ > 0, i.e. replace B by B where B isidentical to B except that the kth eigenvalue equals ·B,k + ‘. From this we get that

A1/2XBXúA1/2 = A1/2XBXúA1/2 + ‘A1/2XvB,kvúB,kXúA1/2,

and thus for any vector x œ Rn

xúA1/2XBXúA1/2x Ø xúA1/2XBXúA1/2x.

Hence we can let x be the eigenvector, umin, corresponding to the smallest eigen-value of A1/2XBXúA1/2, denoted ⁄min. By positive definiteness and the definition ofthe minimal eigenvalue we get that

⁄min = uúminA1/2XBXúA1/2umin Ø uminA1/2XBXúA1/2umin

Ø uminA1/2XBXúA1/2umin = ⁄min,

where umin is the eigenvector corresponding to the smallest eigenvalue, ⁄min, ofA1/2XBXúA1/2. Hence we have shown the claim.

Lemma 3.6.5. If c > 1 then F is constant on3

0,11 ≠ c≠1/2

22

hA,1hB,1

4

Proof of Lemma 3.6.5. We know that if A = hA,1 ◊ I and B = hB,1 ◊ I then theinfimum of the support of the e.s.d. for the nonzero eigenvalues converges to (1 ≠Ô

c)2 ◊ hA,1hB,1 (see [3]). As the non-zero eigenvalues of any other A and B aregreater than or equal to hA,1 and hB,1 respectively, we can use Lemma 3.6.4 to getthat the non-zero eigenvalues of N≠1A1/2XBXúA1/2 are greater than or equal to


those of N≠1A1/2XBXúA1/2 for any finite n. As we also know that the e.s.d. ofN≠1A1/2XBXúA1/2 converges a.s., we can conclude that the infimum of the supportof N≠1A1/2XBXúA1/2 is greater of equal to that of N≠1A1/2XBXúA1/2.

Lemma 3.6.6. If c > 1 and g is a bounded real-valued function defined on [hA,1, hA,2]with finitely many points of discontinuity, then

lim‘æ0

+

lim÷æ0

+

⁄+‘

≠‘

⁄

RIm

S

U g(a)a

sR

b1+bceF (›+i÷)

dHB(b) ≠ › ≠ i÷

T

V dHA(a)d›

= lim‘æ0

+

lim÷æ0

+

⁄+‘

≠‘Im [◊g(› + i÷)] d›

=⁄

R

g(a)1 ≠ a

ceF (0)

dHA(a)

where¯F = (1 ≠ c)1

[0,Œ)

+ cF and eF (0) := lim‘æ0

+ lim÷æ0

+(› + i÷) ◊ eF (› + i÷).

Proof of Lemma 3.6.6. Notice first that from [62] we get that

⁄

R

b

1 + bceF (z)dHB(b) = zmF (z) + 1eF (z) .

Now from the definition of mF (z) we have that

mF (z) = c≠1 ≠ 1z

+ c≠1m¯

F (z).

As we know that m¯

F (0) := lim‘æ0

+ lim÷æ0

+ m¯

F (› + i÷) exists, this implies from abovethat

mF (0) := lim‘æ0

+

lim÷æ0

+

(› + i÷) ◊ mF (› + i÷) = c≠1 ≠ 1

also exists. From Theorem 4.1.1 of [62] we get that

mF (0) = lim‘æ0

+

lim÷æ0

+

⁄

R

1a

sR

b(›+i÷)+bc[(›+i÷)e(›+i÷)]

dHB(b) ≠ 1dHA(a), (3.27)


so this combined with existence of mF (0) yields that

eF (0) := lim‘æ0

+

lim÷æ0

+

(› + i÷) ◊ eF (› + i÷)

is well defined. Using this fact we can now take the limit in (3.27), we get that eF (0)satisfies the equation

1 ≠ c≠1 =⁄

R

1a 1

cef (0)

≠ 1dHA(a).

By defining

µ(z) :=⁄

R

g(a)≠a

sR

bz+bcze(z)

dHB(b) + 1dHA(a)

we can see that

µ(0) := lim‘æ0

+

lim÷æ0

+

µ(› + i÷) =⁄

R

g(a)1 ≠ a

ceF (0)

dHA(a).

Hence by using Lemma 9 of [31] we can see that

lim‘æ0

+

lim÷æ0

+

⁄+‘

≠‘

⁄

RIm

S

U g(a)a

sR

b1+bceF (›+i÷)

dHB(b) ≠ › ≠ i÷

T

V dHA(a)d›

= lim‘æ0

+

lim÷æ0

+

⁄+‘

≠‘Im

S

U≠ 1› + i÷

⁄

R

g(a)≠a

sR

b(›+i÷)+bc(›+i÷)eF (›+i÷)

dHB(b) + 1

T

V dHA(a)d›

= lim‘æ0

+

lim÷æ0

+

⁄+‘

≠‘Im

C

≠µ(› + i÷)› + i÷

D

d›

= µ(0) =⁄

R

g(a)1 ≠ a

ceF (0)

dHA(a).

Notice that in the case where B = I we can use Lemma 2 of [31] to get that ≠ 1

ceF (0)

=m

¯

F (0) and hence this lemma generalizes their Lemma 10.

We now got all we need to proceed with the proof of Theorem 3.3.1.


Proof of Theorem 3.3.1. Start by recalling that

�n(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(1)(⁄ + i÷)

Èd⁄.

Hence we can use Lemma 3.6.2 to deduce that

�(x) = limnæŒ

�n(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(1)(⁄ + i÷)

Èd⁄,

’x œ R where �(x) is continuous. In the case when c < 1 we can use the definitionof e(z) = ◊(1)(z) and Lemma 3.6.3 to get

�(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm [eF (⁄ + i÷)] d⁄ =

⁄ x

≠Œlim

÷æ0

+

Im [eF (⁄ + i÷)]1

fiIm [mF (⁄)]

Im [mF (⁄)] d⁄

=⁄ x

≠Œlim

÷æ0

+

Im [eF (⁄ + i÷)]Im [mF (⁄)] dF (⁄) =

⁄ x

≠Œ

Im [eF (⁄)]Im [mF (⁄)]dF (⁄),

which shows the claim for c < 1. For the case of c > 1, we start by using Lemma3.6.3 and Lemma 3.6.6 to get that

lim‘æ0

+

[�(‘) ≠ �(≠‘)] = lim‘æ0

+

1fi

⁄ ‘

≠‘lim

÷æ0

+

ImË◊(1)(› + i÷)

Èd› =

⁄

R

a

1 ≠ aceF (0)

dHA(a).

By now proceeding as in the proof of Lemma 3.6.6 and calculating eF (0) using (3.4),we get that

eF (0) = lim‘æ0

+

lim÷æ0

+

⁄

R

a

as

Rb

(›+i÷)+bc[(›+i÷)e(›+i÷)]

dHB(b) ≠ 1dHA(a)

=⁄

R

aa

ceF (0)

≠ 1dHA(a),

and thus if we combine the last two equations we get

lim‘æ0

+

[�(‘) ≠ �(≠‘)] = ≠eF (0).

From this and the positive support of the eigenvalues we get that in a neighborhood


of 0 it holds that

�(0) =⁄ x

≠Œ≠eF (0)d1

[0,+Œ)

(⁄).

We also know from Lemma 3.6.5 that F (⁄) = (1 ≠ c≠1) 1[0,+Œ)

(⁄) in a neighborhoodof 0, so by combining these last two results we get

�(0) =⁄ x

≠Œ

ceF (0)1 ≠ c

dF (⁄),

which show the claim for the case of c > 1 and x = 0. When c > 1 and x > 0 we canproceed in the same way as for the c < 1 case. Thus we have shown the claim.

We will now prove Theorem 3.3.2 in a similar way.

Proof of Theorem 3.3.2. Recall from (3.16) that

�n(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(≠1)

n (› + i÷)È

d›,

as ◊(≠1)

n (z) = ◊(g)

n (z), with g(x) = 1/x. Since g is continuous on [hA,1, hA,2] we knowfrom Theorem 3.2.1 that

◊(≠1)(z) = limnæŒ

◊(≠1)

n (z) =⁄

R

a≠1

as

Rb

1+bce(z)

dHB(b) ≠ zdHA(a),

’z œ C+. Thus we know from Lemma 3.6.2 that

�(x) = lim÷æ0

+

1fi

⁄ x

≠ŒIm

Ë◊(≠1)(› + i÷)

Èd›,

for all x œ R where �(x) is continuous. Now start by assuming that c < 1. By usingthe same idea as in the proof of Theorem 3.2.1, set fl(·) = g(·)◊· = 1 and use (3.22)


to get

◊(≠1)(z) =mF (z) ≠

ËsR

b1+ceF (z)b

dHB(b)È≠1 s

R ⁄≠1dHA(⁄)

zËs

Rb

1+ceF (z)bdHB(b)

È≠1

(3.28)

= mF (z) [zmF (z) + 1]zef (z) ≠ 1

z

⁄

R⁄≠1dHA(⁄), (3.29)

where we have also used [62]. Hence we get that ’⁄ œ R

lim÷æ0

+

ImË◊(≠1)(⁄ + i÷)

È= 1

⁄Im

CmF (⁄) [⁄mF (⁄) + 1]

ef (⁄)

D

= fi

⁄

ImË

mF (⁄)[⁄mF (⁄)+1]

ef (⁄)

È

Im [mF (⁄)] F Õ(⁄),

so by substituting this into the definition of �(x) we get

�(x) =⁄ x

≠Œ

1⁄

ImË

mF (⁄)[⁄mF (⁄)+1]

ef (⁄)

È

Im [mF (⁄)] dF (⁄),

which shows the theorem for c < 1. Now for the case of c > 1 we will proceed in thesame way as in the proof of Theorem 3.3.1. Thus first use Lemma 3.6.3 to get

lim‘æ0

+

[�(‘) ≠ �(≠‘)] = lim‘æ0

+

1fi

⁄ ‘

≠‘lim

÷æ0

+

ImË◊(≠1)(› + i÷)

Èd›.

Next rewrite equation (3.28) as

◊(≠1)(z) = 1z

CzmF (z) [zmF (z) + 1]

zeF (z) ≠ mHA(0)D

.

Now set µ(z) = ≠ zmF (z)[zmF (z)+1]

zeF (z)

+ mH(0) and note that

µ(0) = lim›æ0

lim÷æ0

µ(› + i÷) = ≠mF (0) [mF (0) + 1]eF (0) + mHA(0)

= ≠c≠1 [c≠1 ≠ 1]eF (0) + mHA(0).


By using Lemma 9 of [31] again we get that

lim‘æ0

+

[�(‘) ≠ �(≠‘)] = lim‘æ0

+

1fi

⁄ ‘

≠‘lim

÷æ0

+

Im

C

≠µ(› + i÷)› + i÷

D

d›

= ≠c≠1 [c≠1 ≠ 1]eF (0) + mHA(0).

From this and the positive support it follows that in a neighborhood of 0 it is truethat

�(0) =⁄ x

≠Œ

C

≠c≠1 [c≠1 ≠ 1]eF (0) + mHA(0)

D

d1[0,+Œ)

(⁄).

We also know from Lemma 3.6.5 that in a neighborhood of 0 we have F (⁄) =(1 ≠ c≠1) 1

[0,+Œ)

(⁄), and by combining these we end up with

�(0) =⁄ x

≠Œ

Cc≠1

eF (0) + cmHA(0)c ≠ 1

D

dF (⁄)

which show the claim for the case of c > 1 and x = 0. We can then show the case ofx > 0 in the same way as in the c < 1, and thus the full claim follows.

Bibliography

[1] Romain Allez and Jean-Philippe Bouchaud. Eigenvector dynamics: general the-ory and some applications. Physical Review E, 86(4):046202, 2012.

[2] Theodore Wilbur Anderson. Asymptotic theory for principal component analysis.Annals of Mathematical Statistics, pages 122–148, 1963.

[3] Zhidong Bai and Jack W Silverstein. Spectral analysis of large dimensional ran-dom matrices, volume 20. Springer, 2010.

[4] Peter J Bickel and Elizaveta Levina. Regularized estimation of large covariancematrices. The Annals of Statistics, pages 199–227, 2008.

[5] Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity. Jour-nal of econometrics, 31(3):307–327, 1986.

[6] Marie-France Bru. Wishart processes. Journal of Theoretical Probability,4(4):725–751, 1991.

[7] Joel Bun, Romain Allez, Jean-Philippe Bouchaud, and Marc Potters. Rotationalinvariant estimator for general noisy matrices. arXiv preprint arXiv:1502.06736,2015.

[8] Yilun Chen, Ami Wiesel, and Alfred O Hero. Robust shrinkage estimation ofhigh-dimensional covariance matrices. Signal Processing, IEEE Transactions on,59(9):4097–4107, 2011.

107

BIBLIOGRAPHY 108

[9] Thomas Conlon, Heather J Ruskin, and Martin Crane. Cross-correlation dynam-ics in financial time series. Physica A: Statistical Mechanics and its Applications,388(5):705–714, 2009.

[10] Romain Couillet, Merouane Debbah, and Jack W Silverstein. A deterministicequivalent for the analysis of correlated mimo multiple access channels. Infor-mation Theory, IEEE Transactions on, 57(6):3493–3514, 2011.

[11] Romain Couillet and Walid Hachem. Analysis of the limiting spectral measureof large random matrices of the separable covariance type. Random Matrices:Theory and Applications, 3(04), 2014.

[12] Romain Couillet and Matthew McKay. Large dimensional analysis and optimiza-tion of robust shrinkage covariance matrix estimators. Journal of MultivariateAnalysis, 131:99–120, 2014.

[13] Romain Couillet, Frederic Pascal, and Jack W Silverstein. The random matrixregime of maronna’s m-estimator with elliptically distributed samples. arXivpreprint arXiv:1311.7034, 2013.

[14] Dipak K Dey and C Srinivasan. Estimation of a covariance matrix under stein’sloss. The Annals of Statistics, pages 1581–1591, 1985.

[15] David L Donoho, Matan Gavish, and Iain M Johnstone. Optimal shrinkageof eigenvalues in the spiked covariance model. arXiv preprint arXiv:1311.0851,2013.

[16] Darrell Du�e. Dynamic asset pricing theory. Princeton University Press, 2010.

[17] Freeman J Dyson. A brownian-motion model for the eigenvalues of a randommatrix. Journal of Mathematical Physics, 3(6):1191–1198, 1962.

[18] Noureddine El Karoui. Spectrum estimation for large dimensional covariancematrices using random matrix theory. The Annals of Statistics, pages 2757–2790, 2008.

BIBLIOGRAPHY 109

[19] Noureddine El Karoui. Concentration of measure and spectra of random ma-trices: applications to correlation matrices, elliptical distributions and beyond.The Annals of Applied Probability, 19(6):2362–2405, 2009.

[20] Robert F Engle and Kevin Sheppard. Theoretical and empirical properties ofdynamic conditional correlation multivariate garch. Technical report, NationalBureau of Economic Research, 2001.

[21] Jianqing Fan, Yingying Fan, and Jinchi Lv. High dimensional covariance matrixestimation using a factor model. Journal of Econometrics, 147(1):186–197, 2008.

[22] William Feller. Two singular di�usion problems. Annals of mathematics, pages173–182, 1951.

[23] Gabriel Frahm and Uwe Jaekel. Random matrix theory and robust covariancematrix estimation for financial data. arXiv preprint physics/0503007, 2005.

[24] LR Ha�. Empirical bayes estimation of the multivariate normal covariance ma-trix. The Annals of Statistics, pages 586–597, 1980.

[25] Tiefeng Jiang. The limiting distributions of eigenvalues of sample correlationmatrices. Sankhya: The Indian Journal of Statistics, pages 35–48, 2004.

[26] Iain M Johnstone. On the distribution of the largest eigenvalue in principalcomponents analysis. Annals of statistics, pages 295–327, 2001.

[27] Tonu Kollo and Heinz Neudecker. Asymptotics of eigenvalues and unit-lengtheigenvectors of sample variance and correlation matrices. Journal of MultivariateAnalysis, 47(2):283–300, 1993.

[28] Sadanori Konishi et al. Asymptotic expansions for the distributions of statis-tics based on the sample correlation matrix in principal component analysis.Hiroshima Mathematical Journal, 9(3):647–700, 1979.

[29] Tze Leung Lai and Haipeng Xing. Statistical models and methods for financialmarkets. Springer, 2008.

BIBLIOGRAPHY 110

[30] Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Pot-ters. Noise dressing of financial correlation matrices. Physical review letters,83(7):1467, 1999.

[31] Olivier Ledoit and Sandrine Peche. Eigenvectors of some large sample covariancematrix ensembles. Probability Theory and Related Fields, 151(1-2):233–264, 2011.

[32] Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2):365–411,2004.

[33] Olivier Ledoit and Michael Wolf. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Institute for Empirical Research in EconomicsUniversity of Zurich Working Paper, (515), 2011.

[34] Olivier Ledoit and Michael Wolf. Optimal estimation of a large-dimensional co-variance matrix under stein’s loss. University of Zurich Department of EconomicsWorking Paper, (122), 2013.

[35] Olivier Ledoit and Michael Wolf. Spectrum estimation: A unified framework forcovariance matrix estimation and pca in large dimensions. Available at SSRN2198287, 2013.

[36] Olivier Ledoit and Michael Wolf. Nonlinear shrinkage of the covariance matrixfor portfolio selection: Markowitz meets goldilocks. Available at SSRN 2383361,2014.

[37] Haoyang Liu, Alexander Aue, Debashis Paul, et al. On the marcenko–pastur lawfor linear time series. The Annals of Statistics, 43(2):675–712, 2015.

[38] Yannick Malevergne and D Sornette. Collective origin of the coexistence ofapparent random matrix theory noise and of factors in large sample correlationmatrices. Physica A: Statistical Mechanics and its Applications, 331(3):660–668,2004.

BIBLIOGRAPHY 111

[39] Vladimir Alexandrovich Marchenko and Leonid Andreevich Pastur. Distribu-tion of eigenvalues for some sets of random matrices. Matematicheskii Sbornik,114(4):507–536, 1967.

[40] Ricardo A Maronna and Vıctor J Yohai. Robust estimation of multivariatelocation and scatter. Encyclopedia of Statistical Sciences, 1998.

[41] Alexander J McNeil, Rudiger Frey, and Paul Embrechts. Quantitative risk man-agement: concepts, techniques, and tools. Princeton university press, 2010.

[42] Attilio Meucci. Risk and asset allocation. Springer Science & Business Media,2009.

[43] JP Morgan. Riskmetrics: technical document. Morgan Guaranty Trust Companyof New York, 1996.

[44] Dayanand N Naik and Shantha S Rao. Analysis of multivariate repeated mea-sures data with a kronecker product structured covariance matrix. Journal ofApplied Statistics, 28(1):91–105, 2001.

[45] Bernt Oksendal, B Oksendal, and BK Ksendal. Stochastic di�erential equations:an introduction with applications, volume 5. Springer New York, 1992.

[46] Debashis Paul and Jack W Silverstein. No eigenvalues outside the support of thelimiting empirical spectral distribution of a separable covariance matrix. Journalof Multivariate Analysis, 100(1):37–57, 2009.

[47] Oliver Pfa�el and Eckhard Schlemm. Eigenvalue distribution of large samplecovariance matrices of linear processes. arXiv preprint arXiv:1201.3828, 2012.

[48] Vasiliki Plerou, Parameswaran Gopikrishnan, Bernd Rosenow, Luis A NunesAmaral, Thomas Guhr, and H Eugene Stanley. Random matrix approach tocross correlations in financial data. Physical Review E, 65(6):066126, 2002.

[49] Boris Podobnik, Ivo Grosse, Davor Horvatic, S Ilic, P Ch Ivanov, and H Eu-gene Stanley. Quantifying cross-correlations using local and global detrending

BIBLIOGRAPHY 112

approaches. The European Physical Journal B-Condensed Matter and ComplexSystems, 71(2):243–250, 2009.

[50] Marc Potters, Jean-Philippe Bouchaud, and Laurent Laloux. Financial appli-cations of random matrix theory: Old laces and new pieces. arXiv preprintphysics/0507111, 2005.

[51] Angelika Rohde, Alexandre B Tsybakov, et al. Estimation of high-dimensionallow-rank matrices. The Annals of Statistics, 39(2):887–930, 2011.

[52] Leonidas Sandoval and Italo De Paula Franca. Correlation of financial markets intimes of crisis. Physica A: Statistical Mechanics and its Applications, 391(1):187–208, 2012.

[53] Jack W Silverstein. Strong convergence of the empirical distribution of eigen-values of large dimensional random matrices. Journal of Multivariate Analysis,55(2):331–339, 1995.

[54] Jack W Silverstein and Sang-Il Choi. Analysis of the limiting spectral distri-bution of large dimensional random matrices. Journal of Multivariate Analysis,54(2):295–309, 1995.

[55] Erinaldo Leite Siqueira, Tatijana Stosic, Lucian Bejan, and Borko Stosic. Corre-lations and cross-correlations in the brazilian agrarian commodities and stocks.Physica A: Statistical Mechanics and its Applications, 389(14):2739–2743, 2010.

[56] Charles Stein. Some problems in multivariate analysis, part i. Technical report,DTIC Document, 1956.

[57] Charles Stein. Estimation of a covariance matrix. In Rietz Lecture, 39th AnnualMeeting, Atlanta, GA, volume 4, 1975.

[58] David E Tyler. A distribution-free m-estimator of multivariate scatter. TheAnnals of Statistics, pages 234–251, 1987.

BIBLIOGRAPHY 113

[59] Hans Von Storch and Francis W Zwiers. Statistical analysis in climate research.Cambridge university press, 2001.

[60] Duan Wang, Boris Podobnik, Davor Horvatic, and H Eugene Stanley. Quan-tifying and modeling long-range cross correlations in multiple time series withapplications to world stock indices. Physical Review E, 83(4):046121, 2011.

[61] Joong-Ho Won, Johan Lim, Seung-Jean Kim, and Bala Rajaratnam. Condition-number-regularized covariance estimation. Journal of the Royal Statistical Soci-ety: Series B (Statistical Methodology), 75(3):427–450, 2013.

[62] L Zhang. Spectral analysis of large dimensional random matrices. NationalUniversity of Singapore PHD Thesis, 2006.

[63] Gilles Zumbach. Empirical properties of large covariance matrices. QuantitativeFinance, 11(7):1091–1102, 2011.

studies in covariance estimation and applications in

Documents