pca finance

Principal Component Analysis andExtreme Value Theory in Financial

Applications

by

Julian Rachlin

A Thesis submitted to the faculty of Princeton

University in partial fulfillment of the

requirements for the degree of Bachelor of Arts

Department of Physics

April 27, 2006

Thesis Adviser: Gyan Bhanot (IAS)

Departmental Representative: Professor Chiara Nappi

Second Reader: Professor Robert Vanderbei

This paper represents my own work in accordance with university

regulations

Abstract

This paper examines the potential of applying two mathematical methods,

Extreme Value Theory and Principal Component Analysis already in use in

astrophysics, to the field of finance. Following the work of Cici Muldoon,

Extreme Value Thoery will be used to create a quantitative stock trading

strategy, the merits of which will be judged by ‘back-checking’ using S&P

500 returns from January 1985 to December 2004. These same returns

will then be subjected to scrutiny using principal component analysis with

the objective of discovering underlying market structure or useful trading

information.

i

Acknowledgements

Sincere thanks to all those who lent their time and support to this project.

First and foremost among this group is my advisor Gyan Bhanot. Thank

you for your guidance, patience, and limitless enthusiasm for the project.

Thank you also for taking the time to meet with me weekly and encouraging

me to continue to work to expand the scope of the investigation throughout

the year.

Further recognition belongs to my departmental advisor Chiara Nappi, my

second reader Robert Vanderbei, and finally Professor Michael Strauss for

his help regarding the astrophysical aspects of this project.

Finally, I’d like to thank my family for all their support throughout my

career at Princeton.

ii

Contents

1 Introduction 1

2 Extreme Value Theory 3

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Principal Component Analysis 13

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Mathematical Description of Basic Method . . . . . . . . . . 14

3.2.1 Geometric Interpretation . . . . . . . . . . . . . . . . 14

3.2.2 An Alternative View: The Karhunen-Loeve Transform 18

3.3 PCA in Astrophysics . . . . . . . . . . . . . . . . . . . . . . . 19

3.3.1 Dimensionality Reduction . . . . . . . . . . . . . . . . 19

3.3.2 A Natural Basis . . . . . . . . . . . . . . . . . . . . . 22

3.4 Financial Application . . . . . . . . . . . . . . . . . . . . . . 23

3.4.1 A Simple Example . . . . . . . . . . . . . . . . . . . . 24

3.4.2 The Application . . . . . . . . . . . . . . . . . . . . . 26

3.4.3 Data Reduction & Eigenvalue Analysis . . . . . . . . . 27

3.4.4 Eigenvector Analysis . . . . . . . . . . . . . . . . . . . 34

3.4.5 Trading Strategy . . . . . . . . . . . . . . . . . . . . . 40

iii

CONTENTS iv

3.4.6 Market Memory . . . . . . . . . . . . . . . . . . . . . 41

4 Conclusion 46

Chapter 1

Introduction

Conceptually, physics and finance seem to have much in common. Both

market analyst and physicist speak daily about the ‘forces’ that affect the

movement of their worlds. They each search tirelessly to understand how

these forces arise, the exact effect they have, and how they act. Though

the force of supply and demand may not seem analogous to the force of

gravity, they affect the world in similar ways. Consider a physical system

that has been disturbed from its equilibrium position. The physicist’s goal

of understanding how equilibrium will be recaptured is identical to the mar-

ket analyst’s task of understanding how financial news will disturb today’s

market ‘equilibrium’ and how the market will change to recapture it. Both

the financial market and physical world change constantly in this way in

response to various forces. Are these similarities superficial or do these

conceptual similarities hint at a deeper overlap still to be explored?

The purpose of this paper is two-fold. It seeks to highlight the signifi-

cant conceptual overlap between physics and finance and demonstrate the

possibility for the advancement of financial research through the application

of mathematical methods currently widely used in physics. An additional

goal is to encourage the further pursuit of ‘financial physics’ which may lead

to perhaps more creative, descriptive, and effective financial models. This

1

CHAPTER 1. INTRODUCTION 2

paper focuses on two mathematical processes already used in astrophysics.

The first is Extreme Value Theory and the second is Principal Component

Analysis.

In the next chapter, we will explore the possibility of using Extreme

Value Theory to create a stock trading strategy. This strategy is outlined

in a past paper by Cecilia Muldoon. Here we shall carry through this strat-

egy and examine its performance against a standard benchmark, the S&P

500. The following chapter will focus on Principal Component Analysis.

The effectiveness of this statistical method in decomposing a financial time-

series will be evaluated. The goal of these applications is to discover to

what extent they can successfully be applied to financial investments and

indicate whether the search for further overlap between these two fields is

advantageous.

Chapter 2

Extreme Value Theory

2.1 Introduction

Extreme Value Theory is a branch of statistical mathematics that studies

extreme deviations from the median of probability distributions [17]. These

values are of practical importance because they often represent times of

greatest loss or gain. Catastrophic earthquakes, floods, and market crashes

are all examples of real world phenomena modeled by EVT. The basis of

the theory was established by Gumbel in 1958. Gumbel found that the dis-

tribution of statistically smallest or largest extremes of a distribution tends

asymptotically to analytic form for a general class of parent distributions

[2]. The general form of the extreme value cumulative distribution function

was found to be

F (x) = e−e−α(x−x0)(2.1)

for maxima and for minima,

F (x) = 1− e−e−α(x−x0)(2.2)

This distribution along with the corresponding probability distributions are

pictured in figure 2.1.

3

CHAPTER 2. EXTREME VALUE THEORY 4

Figure 2.1: Gumbel Cumulative & Probability Distribution Functions [17]

To date, extreme value theory is used in a variety of applications. In en-

gineering, it is widely used to model component failure risk and in the insur-

ance industry it is used to model the risk of large payoffs from catastrophic

events. Other fields of application include hydrogeology and meteorology.

In an unpublished junior paper entitled “Extreme Value Theory: Bright

Cluster Galaxies and the Stock Market” Cecilia Muldoon discusses recent

efforts by Bhavsar to use extreme value theory to determine whether bright

cluster galaxies are a special class of galaxy or just the extremes of a normal

distribution. His method compared how well empirical evidence matched

extreme value distribution models, normal distribution models, and combi-

nations of the two. Then inspired by her knowledge of extreme value theory,

Muldoon next proposed a financial trading strategy based on the theory as

formulated by adviser Gyan Bhanot. In the remaining portion of this chap-

ter we shall carry out the proposed strategy and examine its performance.

2.2 Method

The trading strategy is founded on the principal that by basing stock selec-

tion of the constituents of the market’s extreme value distribution a portfolio

can be created that consistently outperforms the market. To form this port-


folio Muldoon suggests using the past 12 months of returns to build an

extreme value distribution both for losses and gains at each time step as

shown in figure 2.2. For this project ∆t = 1 Month. The stocks of these

distributions will then be used to form the portfolio’s short and long po-

sitions, respectively, for the coming month. To build these distributions,

the whole population of stock returns of the past 12 months will be iso-

lated. Then from this group, sub-populations of S random returns will be

selected. The maxima (maximum and minimum) will be chosen from this

sub-population and retained for addition to the extreme value distribution.

These sub-populations will be chosen a few thousand times, the maxima of

which will be used to define the extreme value distribution of the past 12

month’s returns. From the two resulting distributions we form a portfolio

the performance of which we can then track by recording the performance

of these stocks the following month.

Figure 2.2: Example of Extreme Losses/Gains Distributions

To accomplish this task we will examine returns from the S&P 500 be-

cause it represents a “homogeneous” population but is diverse enough to be

representative of the market as a whole. The data we use contains monthly

stock returns for the S&P 500 constituents from the month of January, 1984

to December, 2004.


As a practical matter we introduce a few more logical constraints. The

first constraint is a pre-determined portfolio size. Our portfolio will contain

no more than 30 stocks. Stocks within this portfolio will be equally weighted.

Secondly, we introduce a step to eliminate the intersection of the two extreme

value distributions. If a stock is volatile it may appear in both distributions.

In an attempt to eliminate potential volatility from portfolio performance

we ignore these stocks. Finally, we ignore dividends and transaction costs

to facilitate calculations.

This leaves only three variables to modify to adjust portfolio perfor-

mance. They are: sample size, number of samples, and long/short ratio.

The first two refer to the process used to build the extreme value distri-

butions. The sample size is the amount of stocks, S, that form the sub-

population groups from which maxima are selected. S, in this sense, is

inversely proportional to the diversity of stocks in the extreme value dis-

tributions. There is a delicate balance that between identifying stocks that

consistently outperform and identifying the few stocks that happen to have

extraordinary returns on some given month. The number of samples, N , is

the amount of times sub-populations are formed and maxima are selected.

The product of this variable and S divided by the whole population size is

the probability of selecting an individual stock to be in a sub-population.

This product should be many times greater than the whole population. Fi-

nally we are free to chose a long/short ratio, which determines the weight

of long and short positions in the overall portfolio. As a practical matter,

long positions are usually more heavily weighted within a typically portfolio

since the market trends upwards.

2.3 Results

Generally the returns produced by this strategy are tied to the adjustment

of two variables discussed before, sample size and long/short ratio. To begin

our investigation we examine the effects of sample size while keeping other


parameters constant. An initially low sample size of 30 will allow a great

diversity of stocks into the portfolio but still indicate whether the strategy

is profitable.

0 50 100 150 200 250−20

0

20

40

60

80

100

120

140

160

180

Time (Months)

Ret

urns


Student Version of MATLAB

Figure 2.3: Sample Size-30; Number of Samples-2000; Long/Short Ratio-9

Figure 2.3 shows how the trading strategy performs over several trials.

Clearly the strategy is profitable, but how the strategy performs in compar-

ison to the S&P index has yet to be seen. Not surprisingly different trials

of the strategy yield different results. This will be especially true for lower

sample size values. This is because low sample sizes allow a greater diver-

sity of stocks into the portfolio. Though profitable, the wide variation in

expected returns is enough to deter investors. To remedy this we increase

the sample size to narrow the range of possible stocks that may comprise

our portfolio. The results are seen below in figure 2.4, 2.5 & 2.6.

As expected, variation in expected returns shrinks as a result. Addition-

ally, expected returns also increases as a result of this change. The increased

sample size has decreased the randomness of the portfolio constituents and


0 50 100 150 200 2500

50

100

150

200

250

Time (Month)

Ret

urns




0 50 100 150 200 2500

50

100

150

200

250

Time (Months)

Ret

urns





0 50 100 150 200 2500

50

100

150

200

250

Time (Months)

Ret

urns




makes the portfolio more reliant on the true extreme value distribution.

The increase in returns suggests three things. The first is that higher sam-

ple sizes yield higher returns. The second is that higher samples sizes yield

lower variation in expected returns. And finally, that this trading strategy

is significantly better than a random portfolio of stocks. However, the op-

posite extreme for sample size is displayed in figure 2.7. Here the sample

size is 300. The result is a significant drop in returns. We conclude that at

either extremes of sample size, returns decline. Thus there is some sample

size within this range that maximizes returns. According to our trials, this

optimal sample size is 100.

Next we investigate the effect of changing the long/short ratio. As a

general rule more weight is typically given to long positions based on the

reasoning that stock prices normally trend upwards. In the previous exercise

the long/short ratio was 9, that is 90% of the portfolio was invested in long

positions and the last 10% in short positions. Figure 2.8 demonstrates that


0 50 100 150 200 2500

50

100

150

Time (Months)

Ret

urns




shift the long/short ratio towards the short positions decrease returns. We

deduce that either this trading strategy works less for stocks experiencing

negative losses or that as a general rule short positions do not help form a

good long term portfolio.

Finally, having roughly determined the optimal parameter values for

both long/short ratio and sample size, we compare the performance of this

trading strategy to owning the S&P 500 index. Figure 2.9 compares the

performance of both portfolios. The dotted red line represents the S&P

index and the two blue lines represent two trials for the extreme value the-

ory portfolio. For this comparison there is no short position. Briefly, the

performances are comparable. Our trading strategy does not outpace the

S&P index but neither falls short of the S&P returns. This means that

with added transaction costs and the higher volatility of the extreme value

portfolio a wise investor would probably prefer to own the S&P index.


0 50 100 150 200 250−50

0

50

100

150

200

250

Time (Months)

Ret

urns

Extreme Value Theory: Short/Long Ratio

9

3

3/2

1


Figure 2.8: Sample Size-100;Number of Samples-2000;Long/Short Ratio-

Multiples

0 50 100 150 200 2500

50

100

150

200

250Extreme Value Theory vs. S&P 500

Time (Months)

Ret

urns


Figure 2.9: Sample Size-100;Number of Samples-2000;Long/Short Ratio-All

Long


2.4 Conclusion

The performance of our optimal extreme value portfolio on average matches

the performance of the S&P. Considering transaction costs are ignored and

higher volatility it seems that owning the S&P index is a better alternative to

this active trading strategy. On the other hand, we cannot yet disregard the

strategy completely. Given its competitive performance, it could be useful

as a technique to highlight potential stock investments that could then be

evaluated qualitatively as part of a more comprehensive trading strategy.

Furthermore with additional computational power a more rigorous effort

to optimize the important parameter of sample size might yield stronger

returns.

Chapter 3

Principal Component

Analysis

3.1 Introduction

Principal component analysis, abbreviated as PCA, is a method of statisti-

cal analysis useful in data reduction and interpretation of multivariate data

sets [9]. The earliest defining application of principal component analysis

was in the social sciences. In 1904 Charles Spearman published “General

Intelligence, Objectively Determined and Measured” in the American Jour-

nal of Psychology. In the paper, he examined human intellectual ability as

it related to various subject matters: mathematics, writing, critical reason-

ing, etc. His analysis using PCA suggested an underlying intelligence factor

that determined intellectual ability regardless of subject matter. This fac-

tor became widely known as IQ. The consequence of the study for PCA was

recognition as a statistical method essential in analyzing large data sets.

PCA has now spread to a large number of scientific fields and is used in a

variety of different applications where analysis of inter-object correlations is

the focus. In particular, the past decades have seen growing use of PCA in

cosmology as the field confronts rising statistical challenges.

13

CHAPTER 3. PRINCIPAL COMPONENT ANALYSIS 14

A general statement of the problem solved by PCA is the following:

Analyze the relationship between the m parameters of n objects provided a

given m×n data set. The central step of PCA is the redefinition of this data

set in terms of a new set of variables which are mutually-orthogonal linear

combinations of the original variables. The new variables define a coordinate

axis in multivariate data space that forms a ‘natural’ classification of the

data. The motivation behind this redefinition is the dimensionality reduction

achieved by an orthogonal basis. PCA condenses correlations in the data to

single variables finding the ‘true dimensionality’ of the data set. Provided

a willingness to sacrifice some accuracy for economy of description, there

is the huge potential for significant reduction in data dimensionality. This

makes PCA a powerful and versatile analysis method.

In this chapter we shall explore the basic mathematical description of the

process and discuss the benefits of such an analysis procedure in the context

of astrophysics. Then we will apply this analysis to a financial times series

as a preliminary search for market structure and trading strategies.

3.2 Mathematical Description of Basic Method

3.2.1 Geometric Interpretation

In the multi-dimensional space of the original data set, defined by its column

and row vectors, PCA seeks new axes that best summarize the data. To that

end PCA choose the axis of best summarization first, followed by the next

best summarization, and so forth until the last bit of data variability has

been accounted for. This first axis, or ‘best fit’ axis, is defined as the axis

that minimizes the sum of the squared Euclidean distances between itself

and the original data points. This is visually represented in Figure 3.1. So

this first new axis is just a best fit line as found in a normal regression. The

next best, or second new axis, will form in conjunction with the first axis

a best-fit plane that minimizes the sum of the squared Euclidean distances


to it. The third axis will create a best-fit subspace in conjunction with

the first two axes. This process continues until all data is accounted for

by a given axis. Consider figure 3.1 once again. Minimizing the sum of

squared Euclidean distances is apparently equivalent to maximizing the sum

of squared projections onto this new axis. This in turn is equivalent to

maximizing the data variance accounted for. We concluded that each new

axis is best fit in the sense that they explain the maximum amount of residual

variability in the data set. Hence PCA replaces the original data axes with

a new set of orthogonal best-fit axes that successively account for as much

residual variability as possible.

Figure 3.1: Simple Visual Representation of PCA

The mathematics of this process is well laid out by Murtagh & Heck

(1987), which shall guide the following explanation. If u is the first new axis

of the PCA process, then the projection of the data in the original data set,

X, on this axis is given by Xu. As prescribed, our axis, u, must maximize

the squared projections given by

(Xu)′(Xu)

This quadratic will be unbounded unless a suitable restriction is put on u.

So let u be of unit length such that u′u = 1. Now letting S = X′X and


given the constraint u′u = 1, we now maximize,

u′Su

To solve, introduce a Lagrangian multiplier,

u′S u− λ(u′u− 1)

Differentiating yields

2Su− 2λu = 0

Which reduces to an ordinary eigenvalue problem.

Su = λu (3.1)

This result suggests that the first new axis is equivalent to the first eigen-

vector u1 of the matrix S corresponding to the largest eigenvalue λ1. We

may then repeat this procedure to find the second axis, but now introducing

the additional constraint of orthogonality with the first axis. This yields a

second solution to Equation 3.1 with eigenvector u2 corresponding to the

second largest eigenvalue λ2. Hence each successive eigenvector is also the

next new PCA axis. Furthermore because eigenvalues determine the rela-

tive merit of eigenvectors, the eigenvalue λk represents the relative amount

of data variability accounted for by corresponding eigenvector, uk.

Analysis of this result proves difficult when S = X′X. This is because,

as in astrophysical data sets, parameters in X are often measured in differ-

ent units. For example distance and luminosity are common parameters in

these data sets and are measured on very different scales. The result is an

apparent overemphasis on certain observations. The standard solution to

this problem is to normalize the data set by reexpressing S as a correlation

matrix. A correlation matrix holds the correlation coefficients of all variable

combinations such that cell Sij holds the correlation coefficient of variable i


with variable j. This correlation coefficient is mathematically defined as

corr(X, Y ) =

n∑i=1

(xi − x)(yi − y)

[ n∑i=1

(xi − x)2n∑

i=1

(yi − y)2]1/2

(3.2)

where the numerator is the covariance between the two variables and the

denominator is the product of the variables standard deviation. In words,

correlation is a measure of the extent to which variables are related. A cor-

relation coefficient represents the strength of this relationship by assigning

it a value between 1 and −1. Near 1 indicates high correlation, which means

the variables are highly related, 0 means not correlated, and −1 represents

a strong inverse relation. S is most frequently a correlation matrix, and

normally otherwise expressed as a covariance matrix.

Continuing on, the solution to equation 3.1 is well known. In general,

a matrix S can be reduced to a diagonal matrix L by premultiplying and

postmultiplying it by a particular orthonormal matrix U such that

U′SU = L (3.3)

The matrix L is a diagonal matrix whose elements are the eigenvalues (λk)

of S. The columns that make up U are its eigenvectors (uk). Equation 3.1

reduces to the characteristic equation,

|S− λI| = 0 (3.4)

where I is the identity matrix. Equation 3.4 leads to a kth degree polynomial

with k roots. These roots are the eigenvalues, λk, that line L. Plugging each

of these values into the equation

[S− λkI]uk = 0 (3.5)

yields k corresponding eigenvectors, uk.


3.2.2 An Alternative View: The Karhunen-Loeve Transform

We have already seen a geometric representation of PCA. Here we take

time to examine an alternative view of PCA as an expansion of terms. The

expansion is commonly named the Karhunen-Loeve Transform and it is this

transform that is often used in astrophysics.

Complementary principal components vk may be found by performing

PCA in the dual space of X. That is, both left and right eigenvectors may

be found. Again, following the discussion of Section 3.2 and Murtagh &

Heck (1987), we maximize:

(X ′v)′(X ′v)

subject to

v′v = 1

As before this produces,

XX ′v1 = µ1v1

Premultiplying X gives,

(XX ′)(Xu1) = λ1(Xu1)

thus µ1 = λ1 since the two eigenvalue problems are identical and so we find

v1 =1√λ1

Xu1

and more generally we find that

vk =1√λk

Xuk ⇒ uk =1√λk

X ′vk (3.6)

Now taking this relationship (Xuk =√

λkvk) and postmultiplying by u′k

and summing gives

Xn∑

k=1

uku′k =

n∑k=1

√λkvku′

k

Which, given the orthonormality of vectors, reduces to

X =n∑

k=1

√λkvku′

k (3.7)


The result is an expansion of the original data in terms of its orthogonal

basis. This expansion is called the Karhunen-Loeve expansion. As with any

expansion of terms a suitable approximation of X may be given by the first

few terms in the expansion.

3.3 PCA in Astrophysics

Our objective in this section is to understand the main benefits of perform-

ing PCA by examining the function it serves in a few astrophysical applica-

tions. We shall focus on two main benefits, reduction of dimensionality and

orthogonal variables.

3.3.1 Dimensionality Reduction

Only decades ago astrophysicists struggled to extract the most amount of

information possible from relatively small data samples. But, thanks mostly

to large digital sky surveys, a field that once lacked data has now become

inundated with astronomical data sets of unprecedented size and complex-

ity. The trend of increasing data, which results from large scale surveys such

as 2MASS(Two Micron Sky Survey), SDSS (Sloan Digital Sky Survey), or

2dFGRS(Two degree Field Galaxy Redshift Survey), has transformed the

problems of data analysis dramatically. These surveys compile observations

on millions of objects and potentially dozens of parameters. The informa-

tion from these surveys could yield answers to basic astronomical questions

and help measure the global cosmological parameters to within a few per-

cent. But to accomplish this, these large data sets need to be properly and

comprehensively analyzed.

The analysis of such data sets is obviously an inherent multivariate sta-

tistical problem. PCA shows great promise in this role because PCA is an

efficient and objective statistical method of determining physically interest-

ing multivariate correlations. As demonstrated, PCA condenses correlated


variables into new uncorrelated variables thereby gaining an economy of de-

scription with little data loss. Reducing the data to its true dimensionality

means capturing underlying trends present in the original data set.

In 1973, Brosche applied PCA to describe the statistical properties of

galaxies. Using a small data set of 31 objects with 7 measured parameters

he found that two independent variables were largely responsible for the

whole amount of data. Later in 1981 Balkowski, Guibert & Bujarrabal

would confirm this bidimensionality in another paper examining a larger

data set. The two axes grouped the parameters such that diameter, HI

mass, indicative mass, and luminosity were contain by an axis labeled ”size”

and morphological type and color index, into a second called ”aspect.” This

reduction in dimensionality revealed underlying parameter correlations that

is a primary benefit of PCA.

But finding the true dimensionality also provides a method for data re-

duction. The two axes Brosche found captured 83% of the original data

implying little loss of information by ignoring remaining eigenvectors. The

data reduction provided by PCA allows the possibility of analyzing larger

data sets. Currently, astronomical data sets increase by an order of magni-

tude in size each generation. This rate outpaces the growth of computational

speed as determined by Moore’s Law. Soon standard analysis methods will

become infeasible for next generation data sets. PCA provides the data

compression necessary to analysis these larger, more complex data sets.

In light of this, Tegmark, Taylor, & Heavens (1997) consider the possi-

bility of PCA as a standard step for linear data compression. Following this

paper, we consider astronomical observations to be a random variables, x,

with probability distribution L(x;Θ) dependant on a vector of parameters

Θ = (θ1, θ2, ..., θm)

The typical procedure for estimating a particular parameter θi is to maxi-

mize its likelihood function, where the likelihood function of θi is a condi-

tional probability function of θi holding all other arguments as fixed. This


maximum likelihood estimation procedure seeks the most likely value of pa-

rameter θi. Important in this procedure is the Fisher Information Matrix

defined by

Fij ≡⟨

∂2(− lnL)∂θi∂θj

⟩(3.8)

This matrix defines our ability to estimate a specific set of parameters and

is a measure of the information content of the observations relative to the

particular parameter. Mathematically, the Fisher Information Matrix is the

variance of the score, where statistically speaking the score is the partial

derivative, with respect to θi, of the natural log on the likelihood function.

So visually this is a measure of the sharpness of the support curve near the

maximum likelihood function. By the Cramer-Rao inequality, for any unbi-

ased estimator the minimum amount of error using this estimation procedure

is defined by

∆θi ≥ 1/(F1/2ii )

Furthermore the covariance matrix C is related to the Fisher Matrix as,

C−1ij ≡ ∂2(− lnL)

∂θi∂θj

For the Gaussian case a well known identity is

Fij ≡12Tr(AiAj + C−1Mij) (3.9)

where Ai ≡ (lnC,i) and Mij ≡ 〈D,ij〉 using standard comma notation for

derivatives.

Now as in Tegmark, Taylor & Heavens (1997) consider a general linear

compression

y = Bx

where y is the new compressed data set. Substituting into equation 3.9 and

assuming B = bt, that is B is a single row vector transformation, we find

that Equation 3.9 reduces to

Fii =12

(btC,ibbtCb

)2

+(btµ,i)2

(btCb)(3.10)


So our task is to find the transformation that maximizes this value. When

µ is independent of the parameter then the second term in Equation 3.10

vanishes and we now seek to maximize

(2Fii)1/2 =|btC,ib|btCb

Since the denominator is always positive because the covariance matrix is al-

ways positive, the search reduces to finding an extremum for the numerator.

Normalizing b we again arrive at the Lagrangian problem

btC,ib− λbtCb

which as we have seen reduces to an ordinary eigenvalue problem. Thus

they have shown that the optimal linear compression is the Karhunen-Loeve

Transform. Because of this result Tegmark, Taylor, & Heavens (1997) sug-

gest that PCA become a standard method of data compression for the in-

creasingly large cosmological data sets available.

In conclusion, the main benefit of PCA, the reduction of data sets to

their true dimensionality, serves two important functions in astrophysical

statistical problems. The first is the capture of underlying trends by iden-

tifying parameter correlations. The second is data compression, which is

important when analyzing large data sets.

3.3.2 A Natural Basis

The use of PCA to analyze redshift surveys has been advocated by Vogeley &

Szalay in several papers. If current models are correct then large-scale struc-

ture in the universe is the result of gravitational instability on an initially

Gaussian density field. For this reason describing the large-scale structure

of the universe may lead to a deeper understanding of the early universe.

Thus finding a suitable way to analyze information from redshift surveys is

of critical importance to current astronomical research.

The preferred way to characterize this structure distribution is the power

spectrum, which is typically estimated by directly summing the planewave


contributions from each galaxy. However such a Fourier expansion has sev-

eral weaknesses. The most significant is the non-orthonormality of the basis

in samples of complex geometry, such as pencil surveys and deep slice sur-

veys. Thus precisely because of its ability to form an orthonormal basis for

any data set, Vogeley & Szalay (1996) suggest an alternative expansion in

terms of the Karhunen-Loeve Transform. The standard PCA procedure is

used to define a new orthonormal basis within original data space.

The results from such an analysis have been shown to yield results that

are comparable to those obtain using traditional methods. Furthermore a

significant amount of data reduction has been shown to be possible.

This not only shows the benefit of the K-L transform in analyzing redshift

surveys of irregular geometries but also demonstrates the usefulness PCA

in defining the a new basis within the data of the original data set. That

is, PCA allows the data to suggest the best basis for data analysis. This

is useful, as shown, when traditional expansions function poorly. But also

for statistical data mining where little a priori knowledge of the results are

known, PCA is an ideal representation of the data.

3.4 Financial Application

The benefits of principal component analysis seem especially appealing when

examining the movement of the stock market. Every year thousands of

companies spend millions processing the dizzying amount of information

available about the market. They hope to understand why events occurred

and how to predict if and when they will occur again. If PCA can help in

any way simplify this analysis or give some insight into the market, this will

be a worthwhile effort.


3.4.1 A Simple Example

To further solidify comprehension of the technique, a simple example is

introduced here. In order to foreshadow its financial application in this

paper, stock returns are used as observations. The mathematical process

should become clearer and a concrete example now will serve to introduce

the specific method used later on.

This example considers three stocks and their returns over a span of the

past three months. The three stocks are: Acme Corp (A), Bells Corp (B),

and Cornerstone Corp (C). The matrix below holds the monthly returns

data for the past three months for each of these three stocks.A B C

Month 1 .43269 .130435 .319149

Month 2 −.01342 −.00962 −.06452

Month 3 −.06452 .178808 −.05085

The first task will be to use these returns to form a correlation matrix

using the definition of correlation in equation 3.2. This will require the

standard deviation and mean of each stock’s data.

µA = .0496 σA = .2622

µB = .0692 σB = .1007

µC = .0553 σC = .1795

This process is made quite easy with the use of computers and mathematical

software. Plugging in and solving gives,

Correlation Matrix =

1 .4400 .9106

.4400 1 .3220

.9106 .3320 1

Next it is possible to use a computer to quickly calculate the eigenvalues


of this matrix and their associated eigenvectors. They are:Eigenvalue3 0 0

0 Eigenvalue2 0

0 0 Eigenvalue1

=

.080225 0 0

0 .75843 0

0 0 2.1613

The trace of this eigenvalue matrix is 3, as expected. Knowing that the

eigenvalues reflect the relative importance of their associated eigenvector,

the first eigenvector is expected to be the most telling principal component

of the three. In fact, it will account for 2.1613/3 ≈ 72% of the data variation.

On the other hand the last eigenvector is inconsequential, accounting for

3% of the variation. Thus PCA has reduced a 3 variable problem into a 2

variable problem. The associated eigenvectors are:

Eig

enve

ctor

3

Eig

enve

ctor

2

Eig

enve

ctor

1 =

0.7252 0.21829 0.65302

−0.10889 −0.90012 0.42181

−0.67987 0.377 0.629

Such a simple example does not lend itself to firm interpretation since

these returns exist as a minute subset of a much larger stock market. How-

ever, imagine for the moment that these three stocks were of interest. It

would be simple to establish that the first eigenvector demonstrates a strong

general correlation among the three stocks. This is because all coefficients

in this eigenvector are positive. Stocks A and C have the highest magnitude

coefficients in this first eigenvector indicating that the correlation described

by the first eigenvector is most strongly felt by these two stocks. The sec-

ond eigenvector describes a weaker trend of anti-correlation of stock B with

stock A and C, as shown by the negative sign of stock B’s coefficient in the

second eigenvector.

Having found these eigenvectors and eigenvalues finishes the mathemat-

ical task required for the principal component process. The next task is to


expand this process to analyze a financial time series comprised of a much

larger universe of stocks, the S&P 500.

3.4.2 The Application

As demonstrated in section 3.4.1, stocks can be analyzed by PCA like any

other correlated variable. For such an analysis either returns or price can be

used as observations. For this application, the process outlined in section

3.4.1 is exactly the process used here. The main exception is the scale. The

universe of this application is the S&P 500, because it is a good representa-

tion of the market as a whole. The data set used is the same as in chapter 2,

twenty years of monthly S&P 500 stock returns beginning in January 1985

and ending in December 2004. The result of this expansion in scope is that

there will now be 500 original variables to track which when analyzed using

PCA will produce 500 eigenvalues and 500 eigenvectors monthly. There are

240 months in this time series and so this means calculating 240× 500 new

eigenvectors in total. To find monthly eigenvectors a correlation matrix is

formed using the previous year’s returns. The eigenvalues and eigenvectors

of this correlation matrix were then found. Each month this process was re-

peated revealing new results. The time dependent nature of the data set as

well as this dramatic increase in scope makes the process of eigenvector in-

terpretation is much more complicated than in section 3.4.1. This endeavor

is intimidating especially given the exploratory nature of this exercise. For-

tunately, our project is guided by the search for three main results:

1. Signals

2. Structure

3. Trading Strategies

Signals are anything that might be useful as an indicator of future events.

For example, if a unique eigenvector composition is consistently observed be-

fore a market crash or period of market growth, then this unique eigenvector


composition is a signal that can be responded to in the future. Provided

with a signal of future events, one could reduce risk and increase gains.

Market structure is the organization and hierarchy of stocks. For in-

stance, the market is organized by industry such that industry trends con-

tribute to the overall market trend. Additionally, certain stocks within the

market carry relatively more weight, or influence, in determining market

movement. These types of divisions and classifications define market struc-

ture. Our analysis might suggest an underlying structure to the market.

By reducing stock correlations a new more fundamental market structure

might be revealed that may yield a deeper insight into the market and its

movement.

Finally, a more all encompassing and final objective is to find any addi-

tional information that could be put to use to build a sound and successful

trading strategy.

3.4.3 Data Reduction & Eigenvalue Analysis

The numerical calculations, though lengthy and time consuming, are not

the challenge of the analysis. The challenge is understanding the results of

these calculations and determining how to extract meaningful information.

The time dependency of the data presents a formidable challenge to this.

Here the difficult job of interpretation becomes even more substantial. The

composition of the eigenvectors is determined entirely by the data and thus

changes with time. Inherent variation in the data can cause considerable

changes in eigenvector composition, but so can other factors like the dele-

tion or addition of a stock to the S&P 500. Furthermore, even if the analysis

is capable of revealing market structure it would be unreasonable to expect

this structure to remain unchanged over the last twenty years. As the econ-

omy transforms, and market forces change, we expect the eigenvectors to

dynamically conform to describe the new situation. While successive calcu-

lations allow for the search for patterns and trends, the fact that these may


come and go as time passes makes their identification a delicate process.

Fortunately, there is hope. As promised, PCA distills the enormous amount

of information regarding these 500 variables into only a few uncorrelated

variables. In fact, for our purposes it turns out only the first 10 of 500

eigenvectors are needed to sufficiently describe the data and beyond that

only the first few contain meaningful trends.

Using only a small subset of the total available eigenvectors begs an im-

portant question: How many and which to use? There are a few conventions

to follow. The SCREE Test is based on a visual graph. Here the eigenvalues

are graphed from greatest to least producing a sharply concave scatter plot.

A cutoff is then determined visually by judging after what eigenvalue only a

remainder of essentially numerically indistinguishable eigenvalues exist. An-

other method prescribes keeping all the eigenvectors with eigenvalues greater

than the eigenvalue mean. The last method, the one used here, is based on

the fact that the amount of variation accounted for by j eigenvectors can be

determined exactly by summing the eigenvalues of those j eigenvectors and

dividing by the trace of the eigenvalue matrix. So if one chooses an accept-

able numerical cutoff, an acceptable percent of the total variation accounted

for, this can be the basis of the stopping rule. Or mathematically,

j∑i=1

λi

N∑i=1

λi

≥ Cutoff Fraction

where N is the total number of eigenvalues. Graphing this quantity for the

first few principal component yields figure 3.2.

Figure 3.2 suggests that the first ten eigenvectors provide sufficient in-

formation to describe the data set. For these first ten eigenvectors the lower

bound cutoff fraction over time is 95%. So retaining only the top ten eigen-

vectors means that dimensionality has been reduced by 98% while retaining

95% of the original information. This incredible reduction demonstrates the


0 50 100 150 200 2500

10

20

30

40

50

60

70

80

90

100

Time (Months)

Per

cent

of T

otal

Var

iatio

n

Fraction of Total Variation vs. Time

1

1−10


Figure 3.2: Cumulative percent of variation accounted for by successive com-

ponents over the time series. The first 10 eigenvectors are shown demon-

strating the high amount of variation these first few eigenvectors account

for.


strength of using PCA for financial data sets. The majority of the market

has been captured by these few new variables and PCA has narrowed our

field of investigation to these few significant, possibly interpretable variables.

Eigenvalues highlight the important eigenvectors, but remembering that

they represent the relative importance of an eigenvector much more can be

learned from them. For example, examining figure 3.2 reveals that the first

eigenvector remains dominant in relative importance over time, but the last

nine eigenvectors, instead of being of markedly decaying importance, are of

relatively equal importance to each other. This suggests that the market

is dominated by a single trend but comprised of many additional trends

of lesser importance. Also consider the nature of the first eigenvalue over

time. Clearly, it is changing from month to month. This means the fraction

of market variation captured by the first eigenvector changes. Often these

changes are gradual, but more obviously the changes are sometimes marked

by drastic increases. An example of this occurs between t = 22 and t = 35.

This indicates that over that time period, in response to certain market

forces or events, a distinct and broad market trend is being experienced

and thus a larger portion of market variability is being captured by the

first eigenvector. An interesting feature of these drastic changes is that the

amount of time they appear for is generally quite uniform. As demonstrated

by the arrow widths in figure 3.2 the market enters these abnormal trend

periods abruptly and then remains in such a state for approximately one

year before the market ‘relaxes’ back to a normal mode. In trading this

could be very useful information. If a trader observes the market entering

this abnormal mode or knows how long the market has been in the mode, he

can then estimate how long prevailing trends will last until a normal market

is regained.

Another potentially interesting comparison is between the fraction of

variation accounted for by each principal component and the movement of

the S&P 500. As demonstrated by the previous exercise, this fraction can


change quite substantially over time. And it might be informative to observe

these changes in comparison to S&P 500 movements to see if there is any

connection. Two benefits come from this comparison. The first is that these

eigenvalue changes might be explained by the S&P 500 movements. For ex-

ample, if an abnormal mode begins the same month as a crash then there is

a clear causal relationship that might in turn also suggest something about

the nature of the crash or subsequent recovery. The second is that eigen-

value changes or magnitudes might serve as a good signal for future S&P

performance. So in this case, we observe if any of the top ten eigenvectors

display indicative and consistent patterns prior to a major event.

0 50 100 150 200 2500

50

100

150

200

250

Time (Months)

Ret

urns

S&P Returns

0 50 100 150 200 25010

20

30

40

50

60

70

Time (Months)

Per

cent

First Principal Component Eigenvalue


Figure 3.3: Comparison of first eigenvalue movement with market move-

ment.


There are a few conclusions that come from this analysis. The first in-

teresting observation is that market crashes generally occur when the first

eigenvalue is relatively high. Specifically the first eigenvalue is only 40%

or greater when a crash occurs or is imminent. A market crash will pro-

duce high first eigenvalues since the first eigenvector will capture this broad

market trend. However high first eigenvalues seem to occasionally precede

market crashes as well. And while high eigenvalues can be explained by

market crashes, market crashes may also be explained by high eigenvalues.

A high eigenvalue indicates that the market is driven largely by a singular

trend which could warn of market instability. An appropriate analogy is the

stability of a table supported by four legs compared to just one. It could be

that high first eigenvalues signal large market downswings. Of course there

are not enough crashes to say with confidence that this is a trend. Fur-

ther investigation will be required. More supporting evidence comes from

a second observation of low first eigenvalues during strong market growth.

Specifically over the 1990’s (t = 80− 150), regarded as a classic example of

a bull market, first eigenvalues are at an all time low.

To investigate more closely the possibility of market signals in the first

eigenvalue movements consider figure 3.4. The most striking feature of the

plot is the rapid increase in eigenvalue during the 1987 stock market crash.

This rapid increase is indicative of the general recovery trend of stocks fol-

lowing the crash. An ‘abnormal’ mode follows the crash. This mode is the

result of the crash and not a signal for the crash as the figure clearly shows.

However it may be informative for further investigation to perform this same

investigation using daily returns. If they exist, signals may be more visible

on this time scale.

This same analysis could be performed for the next nine principal compo-

nents however the remaining nine eigenvalue plots follow approximately the

same pattern. This suggests that they are reacting to the first eigenvalue and

change very little independently. If this is true then these plots have little


10 15 20 25 30 350

10

20

30

40

50

Time (Months)

Ret

urns

S&P Returns

10 15 20 25 30 35

40

50

60

70

Time (Months)

Per

cent

First Principal Component Eigenvalue


Figure 3.4: Closer comparison of first eigenvalue movement with market

movement at the 1987 market crash.


to reveal. In conclusion it seems that there is a loose anti-correlation be-

tween market returns and first eigenvalues. Low eigenvalues indicate market

stability and growth and high eigenvalues foreshadow market downswings.

3.4.4 Eigenvector Analysis

Now we move past the information provided by the eigenvalue matrix and

on to an analysis of the eigenvectors themselves. The eigenvectors are just

lists of coefficients that correspond to a stock in the S&P 500. These coef-

ficients indicate the weight of this stock within that particular eigenvector.

Higher magnitude coefficients indicate greater importance within the eigen-

vector. The distribution of these coefficients among the stocks define each

eigenvector. From these coefficients one can examine which stocks form the

core of the eigenvector by finding all those with the highest magnitude coef-

ficients. In other words, one can find which group of stocks best represents

the eigenvector as a whole. Furthermore, by comparing the stocks in this

core group to each other one may determine a connection between them

and understand what larger group they are a part of. This could lead to

a suitable approach for eigenvector interpretation. For a static eigenvec-

tor of 500 coefficients this is already challenging and unfortunately, due to

the time-dependent nature of this data set, eigenvector composition changes

through time. Thus taking advantage of this interpretive approach requires

creative methods. As a preliminary measure we look at the eigenvectors’

performance. To do this, at each new time step t0 new eigenvectors are

found using the past year’s data. Then stock is bought in amounts weighted

by the eigenvector coefficients. The magnitude of the coefficient determines

how heavily each eigenportfolio invests in a certain stock and the sign of

the coefficient determines whether that stock is bought or sold (shorted).

Portfolio performance is then measured by the average of the product of the

portfolio stock coefficients and their returns over month t0. By examining

the trends of these eigenvectors’ performance it may be possible to under-


stand the eigenvectors and use them without directly sifting through their

composition.

The next step is to define sectors. In finance, traders often break up the

S&P 500 universe into 10 different sectors. This decomposition allows for a

more detailed explanation of the whole market’s movement. These sectors

are defined by industry. The utility of such a grouping strategy lies in the

high correlation between stock returns of like companies. By forming sectors

analysts hope to understand the composite trends that move the market. In

essence, the current PCA analysis has done the same thing. Within each

eigenvector is a grouping of like stocks based on correlation. This is defined

by individual stock coefficients within each eigenvector. So the ten eigenvec-

tors have essentially redefined the ten S&P sectors, this time not in terms

of industry but purely based on correlation of stock movement. However,

because each eigenvector is comprised of all S&P 500 stocks weighted by

varying degrees, which stocks to include in these new sectors becomes im-

portant. As before, coefficients within a particular eigenvector define the

importance of a stock within the eigenvector. Thus taking the stocks with

the highest valued (absolute valued) coefficients should suitably define these

new sectors. The composition of S&P sectors ranges from 10 stocks to 90

stocks. So to define these new PCA sectors a judgement will have to be

made as to how many stocks to include in each sector. Unlike the industry

breakdown, these sectors are not mutually exclusive and stocks may appear

in several sectors. The most importance criteria in how many stocks to select

is how well this group of stocks approximates the entire eigenvector’s perfor-

mance. To make this judgement, observe how closely sectors comprised of

the first 30 (Blue Line), 50 (Green Line), 75 (Black Line), 100 (Orange Line)

stocks match the movement of the whole eigenvector (Red Line) in figure

3.5. Then choose the smallest number of stocks that still approximates the

eigenvector reasonably well to form the new sectors.

The results of this analysis are below. If none of the stock amounts


Figure 3.5: First Component Poten-

tial Sectors

Figure 3.6: Second Component Poten-

tial Sectors

Figure 3.7: Third Component Poten-

tial Sectors

Figure 3.8: Fourth Component Poten-

tial Sectors

Figure 3.9: Fifth Component Poten-

tial Sectors

Figure 3.10: Sixth Component Poten-

tial Sectors


Figure 3.11: Seventh Component Po-

tential Sectors

Figure 3.12: Eighth Component Po-

tential Sectors

Figure 3.13: Ninth Component Poten-

tial Sectors

Figure 3.14: Tenth Component Poten-

tial Sectors


approximated the eigenvector very well, then the best was chosen and if

higher amounts only estimated the eigenvector better by marginal amounts,

then smaller stock amounts were chosen.

Sector # Stocks in Sector

1 30

2 100

3 50

4 100

5 100

6 30

7 100

8 30

9 75

10 100

The results are actually slightly disappointing and indicate that only a

few eigenvectors, if not just the first, can be represented by a smaller sub-

set of stocks. However, examining the eigenvector trends that demonstrate

distinct correlation, anti-correlation and neutrality with market movements

does provide some possibility of creating a hedging strategy to gain excess

returns. Furthermore one very interesting result to come from this inves-

tigation is shown in figure 3.15. The figure shows that a trading strategy

formed solely on the premise of always holding the first eigensector will over

a twenty year period tend to outperform the S&P. This is certainly a re-

markable feat and further investigation should be dedicated to determining

whether this is a feasible trading strategy or whether transaction cost will

destroy this potential strategy for excess returns.


0 50 100 150 200 2500

50

100

150

200

250

300

Time (Months)

Ret

urns

First Sector Returns vs. S&P 500 Returns


Figure 3.15: First Sector (Blue Line) Returns against S&P Returns (Red

Line)


3.4.5 Trading Strategy

Having chosen sectors, we now engage in an exercise aimed at using these

sectors to gain excess returns. We’ve seen already that buying the first sec-

tor (the first 30 stocks of the first eigenvector) tends to outperform the S&P.

This might be a decent trading strategy in itself, however unless executed at

large scales, transaction costs will likely undercut this potential for excess

returns. Most other eigensectors have somewhat erratic return trends. This

makes them unsuitable for use in this trading strategy. However the first

four sectors along with the eighth have remarkably predictable trends and

may provide a way to realize significant returns. The first sector produces

the highest returns, however is susceptible to market crashes. The second

sector produces much smaller returns, but seems more resilient to crashes

and downward swings. It seems right to let these form a long position.

The short position shall be the combination of the 4th eigensector and the

8th eigensector, both of which are anti-correlated with the market. If the

sacrifice in overall returns due to a linear combination of these four sectors

is sufficiently compensated by lower volatility, then the combination is a

better portfolio. Figure 3.16 shows the results of a few combinations. The

dotted redline is the S&P performance. Above in black is the first eigensec-

tor alone. Below is an equal weighted combination of all four eigensectors

in green. Finally in blue, a combination of sectors with 80% investment in

the first sector and the remaining percent distributed among the other three

sectors is displayed. Additionally, the third sector could be included in this

strategy, though this possibility will not be explored here. The volatility of

the third eigensector does suggest that in a more advanced and comprehen-

sive strategy the third eigensector may be exploited using derivatives. A

straddle would be an appropriate tool for this function.

The overall result of the exercise is that lower volatility can be achieved

but not without overly sacrificing returns as demonstrated by the green line.

Furthermore the blue line shows that a more reasonable returns sacrifice


does little to eliminate volatility. The inability to find a linear combination

of stocks that produces excess returns or decreases volatility with only a

slight drop in returns is disappointing. And it is largely the result of the

eigenvector structure which is dominated by a first eigensector that generates

far superior returns than any other eigensector. The simplicity of the market

structure represented by this picture is in itself surprising.

However enough information has been found to suggest an alternative

trading strategy revolving solely around the first eigensector, which has al-

ready been demonstrated to gain excess returns. Since the returns of the

market can be approximated reasonably well by the top 30 stocks in the

first eigenvector, one might consider examining the prospects of these firms

before investing to determine an average projection of returns over the next

month. Provided this average is positive, investments in the market and

these 30 stocks in particular should be made. If on the other hand the

overall picture delivered by these 30 stocks seems weak, perhaps one should

avoid investing to avoid potential losses. In this way PCA identifies the few

stocks that provide the most accurate composite picture of future market

movement.

3.4.6 Market Memory

The fluctuations of eigenvector composition through time demonstrates an

expected temporal variation in market structure. Eigenvector composition

adjusts to best describe current market trends, which are themselves tem-

porary and changing. A parameter of interest to financial analysts is the

duration of these trends. It may then be informative to examine the extent

to which eigenvectors change as a function of time. Such an examination

could shed light on the financial concept of market memory, which defines

how long past trends and events persist in affecting today’s market. Market

memory is an important concept for traders, who need to judge when to

“cash out.” If they were to have a deeper understanding as to how long


0 50 100 150 200 2500

50

100

150

200

250

300

Time (Months)

Ret

urns

Portfolio Returns


Figure 3.16: Portfolio returns for linear eigensector combinations.

trends persist in general, then they may be in a better position to judge

when current trends will subside. The dynamics of the eigenvectors may

provide an objective method of estimating market memory. There are two

related paths of investigation that should produce results.

The first test to conduct will measure at each time period how many

months must pass before less than X percent of the original eigensector

remains. That is, at time t0 the first eigensector, E0 is found. Then the first

eigensector is found for time t1, t2, t3, ... until

E1⋂

En

30≤ X

Where 30 is the total number of stocks in the sector and n is the value

for market memory or number of months passed. For such a test the first

eigensector, which captures the majority of all trends, appears to be the

strongest candidate for investigation. If this test were executed using lesser

eigensectors, defined by weaker trends, a progressively lower value for mar-

ket memory should be expected. The result of the test is the relationship


between percent of original eigensector stocks remaining, X, and time, n.

20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

1

2

3

4

5

6

7

Tim

e (M

onth

s)

Percent of Original Stocks that Remain in the First Sector

Percent of First Eigensector Overlap vs. Time


Figure 3.17: Test 1 - Percent of first eigensector that remains as a function

of time.

We note several features of the result. The first is that the overlap de-

creases exponentially as a function of time and not linearly as might be

expected. Furthermore it seems that there is significant overlap from month

to month as on average there is still 85% residual overlap in the first eigen-

sector. However after approximately six months, overlap is only 25%.

The second test will measure the Euclidean distance of a t0 eigenvector

to n future eigenvectors. The Euclidean distance between two eigenvectors

is defined as,

||(E0 − En)||

Here the distance will serve as a measure of past correlation, where large

distances indicate tiny eigenvector overlap. Distance is expected to grow

as time moves forward, indicating the fading market memory. This should

display how quickly and at what rate market memory fades.


0 2 4 6 8 10 12 14 16 18 20 22 240.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time(Months)

Euc

lidea

n D

ista

nce

Euclidean Distance vs. Time


Figure 3.18: Test 2 - Euclidean Distance between first eigenvectors at dif-

ferent times.

The result reflects a quick, nearly linear, increase in Euclidean distance

over a period of six months. This begins with a sharp increase in distance

within the first month, which corresponds to dropping eigenvector overlap.

Nearly 30% of the maximum Euclidean distance is obtained during this first

period, indicating that some trends are destroyed within a single month.

This result matches well with the previous test which also indicates a nearly

30% drop in overlap within the first month.

After the six month mark, Euclidean distance begins to plateau. This

feature indicates two things: residual trends that exist over a period of

roughly two years, and intermediate trends that are destroyed after six

months. These features are not as well observed in the first test.

In conclusion, it seems these results support the classic division of trends

into three types: short, intermediate, and long-term trends. Intermediate

trends last a maximum of six months and long-term trends can last several

years. The investigation also indicates short term trends last approximately


a month, however given that this is the minimum time step of the investiga-

tion, future work could be done using weekly or even daily data to investigate

the eigenvector overlap with the first month. This is expected to follow the

same exponential trend as indicated by these results.

Chapter 4

Conclusion

In this paper we demonstrate potential areas of overlap between the fields

of physics and finance. The motivation for such an overlap is the rigorous

mathematical methods in use in physics that may be useful for financial

analysis. The field of astrophysics provides a suitable starting place for

such a search because the challenges of statistical analysis in this field are

similar to those existing in finance. The application of extreme value theory,

which has been used to determine the nature of Bright Cluster Galaxies,

and Principal Component Analysis, which is used extensively to analyze

large astrophysical data sets, to finance in this paper has met with mixed

results. But though the results of this paper are not overwhelming, they

are encouraging enough to suggest that continued investigation into this

potential overlap is a worthy effort.

Notable successes of this investigation have been the ability to quantify

market memory using PCA and the remarkable success of the first principal

component to describe and outpace the S&P 500. Also the demonstrated

potential of an EVT based trading strategy to produce excess returns means

that though the strategy is erratic, further refinement may yet provide a path

to beating the S&P index.

However, to our disappointment PCA failed to reveal any firm market

46

CHAPTER 4. CONCLUSION 47

signals or demonstrate the existence of deep market structure. Furthermore

the erratic returns of the EVT trading strategy are still worrisome.

However the investigation has also uncovered many places for future

investigation. Future studies might focus on, a more rigorous analysis of

the EVT trading strategy focusing on finding optimal parameter values. Or

one might create a new trading strategy based on the performance of the

first principal component. Additionally one may also choose to examine

eigenvalue movement on a shorter time scale just prior to market crashes to

further investigate the potential existence of market signals.

In conclusion, mathematical methods in use in physics may provide new

and important tools for financial analysis. Our application of extreme value

theory and principal component analysis has met with encouraging but not

overwhelming results. But the investigation into the merit of such an over-

lap suggests that further work should definitely be pursued, but without

expectation of dramatic insight. Furthermore the implementation of math-

ematical methods in finance based not only on overlap of mathematical

challenges, but inspired by a more fundamental conceptual overlap using

analogy to model the financial market from physical phenomenon, should

be the next step in the melding of these two fields.

Bibliography

[1] Bhavsar, Suketu P. Probing the Nature of the Brightest Galaxies Us-

ing Extreme Value Theory. Conference on Extreme Value Theory and

Applications, 1993. Dordrecht: Kluwer Academic, 1994.

[2] S.P. Bhavsar, The Astrophysical Journal, 338, 718 (1989).

[3] Bujarrabal, V., J. Guibert, C. Balkowski. “Multidimensional Statistical

Analysis of Normal Galaxies.” 1981, Astronomy and Astrophysics, 104,

1-9.

[4] Deeming, T. “Stellar Spectral Classification.” Mont. Not. Astron. Soc.

127, 493-516, 1964.

[5] Feigelson, Eric D., and G J. Babu, eds. Statistical Challenges in Modern

Astronomy. New York: Springer, 1992.

[6] Feigelson, Eric D., and G J. Babu, eds. Statistical Challenges in Modern

Astronomy. New York: Springer, 2003.

[7] Galambos, Janos, James Lechner, and Emil Simiu, eds. Conference

on Extreme Value Theory and Applications, 1993. Dordrecht: Kluwer

Academic, 1994.

[8] G. Bhanot, Personal Correpondence, October 2005 - May 2006.

[9] Jackson, J Edward. A User’s Guide to Principal Components. New

York: Wiley-Interscience, 1991.

48

BIBLIOGRAPHY 49

[10] Murtagh, F, and A Heck. Mulitvariate Data Analysis. Dordrecht: D.

Reidel, 1987.

[11] Muldoon, Cecilla. Extreme Value Theory: Bright Cluster Galaxies and

the Stock Market. (unpublished reference), 2005.

[12] Pelat, D. “A Study of HI Absorption Using Karhunen-Loeve Series.”

Astron. & Astrophys. 40, 285-290 (1975).

[13] Strauss, Michael. “Reading the Blueprints of Creation.” Scientific

American, February, 2004: 54-61.

[14] Tegmark, Max, Andy N. Taylor, and Alan F. Heavens. “Karhunen-

Loeve Eigenvalue Problems in Cosmology: How Should We Tackle

Large Data Sets.” The Astrophysical Journal 480 (1997): 22-35.

[15] Vogeley, M.S. & Szalay, A.S. “Eigenmode Analysis of Galaxy Redshift

Surveys.” The Astrophysical Journal 465 (1996): 34.

[16] Whitney, C.A. “Principal Component Analysis of Spectral Data.” As-

tron. Astrophys. Suppl. Ser. 51, 443-461 (1983).

[17] http://en.wikipedia.org/wiki/Gumbel distribution, April 24.

pca finance

Documents

eigenvector analysis

extreme value thoery

boththe financial market

market memory

iicontents1 introduction

underlying market structure

advisor gyan bhanot

financial news