heheh

Comparison of Different Methods in Forecasting

Stocks’ Returns or Prices

Zhicheng Li/Sirui Zhang/Haoran Jiang

Abstract

In this paper, four models are built in order to explain stocks behav-ior, and the corresponding methods are used to forecast stocks’ returnsor prices in S&P 500 universe. All the forecasting results are comparedwith the real values. It is shown that the traditional time series meth-ods, including univariate (in AR model) and mutivariate (in VAR model)methods, give little forecastability. On the contrary, the methods based onstatistical arbitrage, i.e, the Pair Trading and Market Neurtral model, per-form much better. Meanwhile, we introduce some statistical techniques,such as Principle Components Analysis (PCA) and mean-reversion con-cept. Finally, Econometrics and statistic analysis are attempt to give areasonable interpretation.

1 Introduction

Forecasting is an everlasting topics not only in Economics but also in Fi-nance. In the stock market, the incentive to make a good forecasting isparticularly strong, in the sense that people who have a better predictionwould make more money. Therefore, a lot of researches have been doneand various models and methods have been proposed and used. Before theage of computers, people traded stocks and commodities mainly on intu-ition. As the level of trading and the technology grew, people searched fortools and methods that would increase their gains meanwhile minimizingtheir risk. Statistics, fundamental analysis, and linear/non-liner regres-sions are all attempt to predict and benefit from the markets direction [5].In recent studies, some new techniques, such as Neural Network, HiddenMarkov Method(HMM) and Genetic Algorithms (GA), are used to fore-cast stocks’ activity [9][10][13]. None of these techniques has proven to beconsistently correct as desired, and many skeptics argue about the utilityof many of these approaches. However, these methods are commonly usedin practice.

In our paper, we present four models with the application to S&P 500stocks market. In each model, we state the concrete method for forecast-ing. Given a particular time window in S&P universe, we forecast thestocks’ prices or returns, then we compare the forecasting results with thereal values by calculating correlations. At last we look at the performanceof each method. The first model we start with is Auto-Regression (AR)

1

model, which is broadly used in time series analysis [5][7]. It assumesthat stock behaves in an autocorrelated and stochastic way, and is notcorrelated with other stocks/factors. Basically, this method attempts tomodel a linear function by a recurrence relation derived from past values.The recurrence relation can then be used to predict new values in the timeseries, which hopefully will be good approximations of the actual values.While in the second model, we think that two stocks, especially in onecommon industry, are tent to be correlated, i.e., a pair of stocks’ pricesare possibly to have a statistical relationship, called cointegration. We digout this property and implement pair trading in the second model. Thismodel is the ancestor of statistical arbitrage, which now is a widely usedmethod in the investment area [16][14]. In the third model, we extendour idea to the point where individual stock is very possibly influencedby whole market. We hope to find those common market factors thateach stock may depend on. Therefore, a statistical method, called Prin-ciple Component Analysis (PCA), is employed to extract these commonmarket factors [17], i.e., Principle Components(PCs). By regressing eachstock on PCs, we infer their relationship, and further by VAR model,which is a multivariate time series model, we forecast how PCs evolute.Then we put the predicted PCs back to the original regressions and fore-cast individual stocks. The last model we apply is market neutral model,in which we form a portfolio whose expected returns are nothing relatedwith the market fundamentals. In spite of how the market fluctuates, theportfolio’ return is just a stationary mean-reverting process. By usingmean-reversion, which is a very important technique in statistical arbi-trage [3], we look for the opportunities that would give us large expectedreturns, and then compare these returns with real values.

The structure of our paper is organized as below. In Section 2, weintroduce the data of S&P 500 stock market that we are using, and wefurther diagnose and discover some property of this data set. Section 3 aredivided into four parts. Each part set forth a model of studying stocks’behaviors and a method of how to forecast stocks’ prices/returns in ourcase. Then in Section 4, we show the results of these four methods andcompare their performance. A detailed and reasonable analysis is alsotried. At last, we make a conclusion in the final Section.

2 Data and Stylized fact

In this paper, we use a database of S&P 500 (Standard & Poor’s 500)from year 1989 to 2012. The data source is from CRSP (Center ResearchSecurity Price), which is part of University of Chicago and renowned forits expertise in building and maintaining historical, academic research-quality stock market databases. The reason to choose S&P 500 is that itcomprises nearly 500 common stocks issued by 500 large-cap companies,and covers about 75 percent of the American equity market by capitaliza-tion. Meanwhile, S&P 500 indice is one of the most commonly followedequity indices, and many consider it one of the best representations of theU.S. stock market, and a bellwether for the U.S. economy [1] (See Figure1).

2

http://www.crsp.com/

Figure 1: Historical S&P 500 Earning and US Nominal GDP

The components of the S&P 500 are selected by the committee. This issimilar to the Dow Jones Industrial Average, but different from others suchas the Russell 1000, which are strictly rule-based. When considering theeligibility of a new addition, the committee assesses the company’s meritusing eight primary criteria: market capitalization, liquidity, domicile,public float, sector classification, financial viability, length of time publiclytraded and listing exchange [2]. The committee selects the companies inthe S&P 500 so they are representative of the industries in the UnitedStates economy. In order to be added to the index, a company must satisfythese liquidity-based size requirements: i) market capitalization is greaterthan or equal to US4.0 billion; ii) annual dollar value traded to float-adjusted market capitalization is greater than 1.0; iii)minimum monthlytrading volume of 250,000 shares in each of the six months leading upto the evaluation date. Therefore, companies in S&P 500 are not static.Sometimes, one company may dropped out from the list, and sometimesanother new company entered. That’s why we could see 1127 stocks’records in our data.

The stocks’ prices in this data set are End-of-Day prices. As we haveroughly 252 business days a year, there are 5799 time records. In addition,these prices are adjusted for including dividends and expanding shares.Thus, the tendency of one stock prices can reflect the market value of thatcompany. Moreover, we normally think price’s increment is proportionalto itself, so the trend of one stock prices is exponential (See Figure 2)and the log-prices would be I(1) process, which means the log-returns(first differences of log-prices) are stationary (Stock and Watson (1988b)).Table 1 is the results of ADF tests for all the stocks, which evidentlyshow that log-prices are basically I(1) process which have unit root andlog-returns are stationary process.

3

Figure 2: Five S&P 500 Stocks Prices’ Evolution

As we have a long time series in broad universe of U.S equities, wecould use back-testing method to compare different methods for forecast-ing stocks’ prices/returns. The principle is following: we set two param-eters, i.e., historical window and forecasting window. Given the data inhistorical window, we anticipate the prices/returns in forecasting window,and then compare them with the actual data. The historical window canmove over time, so we can get a series of comparison results and makea judgment. Another issue is that within a particular historical window,some companies are not belong to S&P 500 or have no data, we needrefine the dataset to those stocks who continuously existed in that period.Standard & Poor believes that turnover in index membership should beavoided whenever possible. Hence companies which were added to theindex usually stays in the index unless too many of the addition criteriahas been violated or if the company no longer exist due to mergers andacquisitions [2]. Thus even it has the selection base which we have men-tioned before, within the certain historical window that is not too long,we can think that stocks behave naturally.

Table 1: Results of ADF tests for log-prices and log-returns processesH0: have a unitroot (5% level)

Ratio of stocksthat accept

Ratio of stocksthat reject

log-prices 95.08% 4.92%log-returns 0 100%

4

3 Models and Methods

3.1 Simple Autoregression Model

At the beginning, let us use a very simple model, that is autoregres-sion(AR) model, which is widely used in single time series problem. Sup-pose we are interested in forecasting the value of a variable Yt+1 basedon a set of variables Xt observed at date t. In this case, Xt consist of aconstant plus Yt, Yt−1, . . . Yt−m+1. Common methodology is to choose theforecast Y ∗t+1|t, so as to minimize

E(Yt+1 − Y ∗t+1|t)2 (1)

which is mean squared error. Y ∗t+1|t has a function form g(Xt) based onthe current information, then the last equation is to find the functiong(Xt) that minimize

E(Yt+1 − g(Xt))2 (2)

When we use linear projection, i.e, g(Xt) is a linear combination of Yt,. . . Yt−m+1, equation 2 becomes a AR model. In our papaer, we justchoose two lags and have the regression model:

Yt − u = φ1(Yt−1 − u) + φ2(Yt−2 − u) + εt (3)

The reason for using two lags linear projection other than some othermethods (AIC/BIC) in determining lags [8] or using non-linear models isthat we think there is a trade-off between the size of samples, the numbersof parameters to be estimated, and the credibility of the model we have.Many parameters to be estimated might cause the lack of precision dueto the estimation process. And because we don’t have a ‘true’ modelgoverning stock prices/returns (Black (1986)), as long as what we havebuilt is effective to some extend as we expect, we could use it.

Back to the equation 3, if we could assume E(εt | Yt−1, Yt−2) = 0and the process {Yt, [Yt−1, Yt−2]} is covariance-stationary and ergodic forsecond moments, then the OLS regression yields a consistent estimate forcoefficients (Hamilton (1994)). Or, we transfer equation 3 to the form:

φ(L)(Y − u) = εt (4)

where the autoregressive operator φ(L) = (1−φ1L−φ2L2). As long as all

the roots of φ(z) = 0 lie outside the unit circle, the autoregression satisfiesthe stationary condition.

In this AR model, we choose log-returns which are already stationaryprocess as our forecasting object. Specifically, if we define Yit as the log-return of stock i at time t, then equation 3 becomes

Yit = β0i + β1iYit−1 + β2iYit−2 + εit (5)

If the previous assumptions hold, we could apply OLS to this regressionand get consistent estimator β̂ki, (k = 0 . . . 2, i = 1 . . . N). Here we shouldnotice that this is not a panel data regression. They are different regres-sions for different stocks, and the coefficients vary between stocks. Further

5

more, we set the length of the moving historical window as 1000 days, andwe want to forecast the next day return E(Yit+1) of stock i, which is

E(Yit+1) = β̂0i + β̂1iYit + β̂2iYit−1 (6)

At last we compare the forecast returns with real returns, and the resultsare shown in next section.

3.2 Pair Trading Model

The assumptions in the previous model are very strong. It is unlikelythat stocks changes by themselves and are uncorrelated with others. Inother words, it is more plausible to think that stocks are possibly corre-lated, especially in the same industry. Figure 3 shows a example that theprices’ evolutions of two stocks in the same industry ‘Petroleum Refining’(SIC:2911) from year 1989 to 1990, and it seems that they are highly cor-related. Hence, in this model, we adopt one relationship which commonlyused in time series, i.e., cointegration, to analysis. Other than dealingwith log-returns, which are stationary process, we consider the log-pricesthat are integrated of order 1. If stocks i and j are in the same industryor have similar characteristics, one expects by hedging one stock on theother to get positive profit (see Pole (2008)). Particularly, denote Pit andPjt as the corresponding price series, when we can model them like

ln(Pit) = αt+ βln(Pjt) +Xt (7)

where Xt is a stationary, or a mean-reverting process. Then the relationbetween these two log-prices which are I(1) series is cointegration. Bytaking first difference of equation 7, log-returns should be satisfied

ln(Rit) = αdt+ βln(Rjt) + dXt (8)

In many situation, the drift α is small compared to the fluctuations of Xt

and can be neglected. Thus the mean-reversion of Xt suggests us that wecould form a long-short portfolio in which we go long 1 dollar of stock iand short β dollars of stock j if Xt is small. And conversely, go short stocki and long j if Xt is large. Both situations are expected to get positivereturns. This mean-reversion paradigm is typically associated with marketover-reaction: assets are temporarily under or over priced with respect toone or several reference securities (Lo and MacKinlay (1990)).

For our dataset, the concrete method is described as below. At firstwithin one historical window, we find a pair of stocks which are cointe-grated without deterministic trend under certain industry (in our data,we use SIC code to identify the industries). Denote them as stock i andj, by regressing one on the other, we have:

ln(Pit) = βln(Pjt) +Xt (9)

And correspondingly, for log-returns,

ln(Rit) = βln(Rjt) + dXt (10)

6

Figure 3: Prices of two stocks in ‘Petroleum Refining’ industry from 1989 to1990

As the Xt is stationary process and we expect to find mean-revertingproperty, we use AR(1) model to do diagnose Xt:

Xt = β0 + β1Xt−1 + εt (11)

Subtracting both sides by Xt−1, we get

dXt = β0 + (β1 − 1)Xt−1 + εt (12)

The mean-reversion requires (β1 − 1) < 0, and the more negative, themore mean-reverting. Therefore, the next step is to, within the particularhistorical window (t=1. . . T), search all the stocks, find the top ten mean-reverting pairs, and denote them as {i∗, j∗}10. Then for these ten pair-trading portfolio, we need forecast their next day returns. By puttingT+1 to the equation 10, it becomes

ln(Ri∗T+1)− βln(Rj∗T+1) = dX∗T+1 (13)

which means that long 1 dollar stock i∗ and short β dollars j∗ would giveus a expected return ET(dX∗T+1). What’s more, from equation 12, it iseasy to see

dX∗T+1 = β∗0 + (β∗1 − 1)X∗T + ε∗T+1 (14)

If we have the valid assumption ET(ε∗T+1) = 0, which is also the require-ment for getting a consistent estimator in AR(1), we could derive theresult:

ET(dX∗T+1) = ET{ln(Ri∗T+1)− βln(Rj∗T+1)}= β∗0 + (β∗1 − 1)X∗T

(15)

showing that the expected returns in next day (T+1) of this pair tradingare just β∗0 + (β∗1 − 1)X∗T . Then we can compare the forecasting returnswith the real returns by using pair trading, which is just ln(Ri∗T+1) −βln(Rj∗T+1) located in the forecasting window. The results of comparisonwill be shown in next part.

7

Moreover, if we want form a strategy to make more money, within theten pairs that are chosen by us, we select the pair (i∗∗ and j∗∗) whoseabsolute expected returns equals max{|β∗0 +(β∗1−1)X∗T |}, and just do pairtrading for that pair. If the expect return value is positive, we just long1 dollar i∗∗ stock and short β dollars j∗∗ stock. While when the value isnegative, on the contrary, we short 1 dollar i∗∗ stock and long β dollars j∗∗

stock. Both cases give us the positive return, i.e, max{|β∗0 +(β∗1−1)X∗T |}.

3.3 VAR Model

From the previous model, we could see that cointegrated time series shareat least one common trend. Both causal observation and economic the-ory suggest that many series might contain the same stochastic trend sothat they are cointegrated. If each of n series is integrated of order 1 andcan be jointly characterized by k < n stochastic trends, then the vectorrepresentation of these series has k I(1) processes and n− k distinct sta-tionary linear combinations. A technique proposed by Stock and Watson(1988a) claim that we can extract common stochastic trends by PrincipalComponents Analysis (PCA). As we already know that log-prices is I(1)process, we can regress each log-prices process on these cointegrated Prin-cipal Components (PCs), then the residual we get should be stationary.Or we can directly use log-returns which are already stationary process,then the principal components and the residuals after regression are allstationary.

Here we briefly introduce PCA. PCA is a statistical method that usesan orthogonal transformation to convert a set of observations of possiblycorrelated variables into a set of values of linearly uncorrelated variablescalled principal components. This transformation is defined in such a waythat the first principal component has the largest possible variance, thatis, accounts for as much of the variability in the data as possible. Andeach succeeding component in turn has the highest variance possible underthe constraint that it is orthogonal to the preceding components. Thus,we can preserve most of the information of original data and meanwhileachieve the purpose of reducing the dimension of dataset, i.e., get smallnumbers of common stochastic trends.

The detail procedure for our case is following. Within one historicalwindow (t=1. . . T, i=1. . . N), we first standardized the volatility of eachstock’s log-prices (pi).

Yit =pit − p̄iσ̄i

(16)

where

p̄i =1

T

T∑t=1

pit ; σ̄2i =

1

T − 1

T∑t=1

(pit − p̄i)2

Then we calculate the covariance matrix of Yit (here is also the correlationmatrix). It is defined as C, and

Cij =1

T − 1

T∑t=1

YitYjt (17)

8

which is symmetric and non-negative definite. Notice that, for any stocki, we have Cii = 1. The next step is to consider the eigenvectors andeigenvalues of the covariance matrix. Define V as the eigenvectors matixand λ as corresponding eigenvalues, i.e,

[V λ] = Eig(C); (18)

As Vi (i = 1 . . . N) are the eigenvectors of the covariance matrix, they areorthogonal to each other. These eigenvectors can form a set of orthogonalbases of another space. When we rank the eigenvalues in decreasing order:

N ≥ λ1 ≥ λ1 ≥ λ1 ≥ . . . ≥ λN ≥ 0

and define V1, V2, V3 . . . VN as the corresponding eigenvectors. A spectrumof eigenvalues shows that they only contain a few large eigenvalues (SeeFigure 4). We can then choose top K eigenvectors which correspond to thebiggest K eigenvalues. From Jolliffe (2005), we know that the projectionof original data on these top eigenvectors V1, V2, V3 . . . VK (also principalbases in new space) can preserve most of the information.

Figure 4: Eigenvalues of the correlation matrix of stocks’ log-prices computedon the first historical window (t=1. . . 100)

Thus, we project the log-prices data in the historical window on thesetop eigenvectors and get K principal components (Fk, k = 1 . . .K):

Ftj =

N∑i=1

Vji

σ̄ipti t = 1, . . . , T j = 1, . . . ,K; (19)

For each stock’s log-prices process, we regress it on those common trends:

pi = θi0 +

K∑j=1

θijFj + δi i = 1, 2, . . . , N. (20)

As they are cointegrated, and if we can claim that the disturbance itemis uncorrelated with PCs, the OLS estimator θ̂ij , (i = 1 . . . N, j = 0 . . .K)

9

are consistent. The next step is that, rather than auto-regressing eachsingle log-price process and forecast, we use Vector Autoregression (VAR)model to forecast these common trends (PCs) and combine them togetherto estimate each log-prices process by putting them back to the originalregression equation 20. A VAR(p) model is written as an vector autore-gression over the previous p values of the series, in this case:

#»F t = #»c + φ1

#»F t−1 + · · ·+ φp

#»F t−p + #»ε t (21)

where

#»F t =

F1t

...FKt

; #»c =

c1t...cKt

; #»ε t =

ε1t...εKt

; φs = {φsij}K×K (22)

And putting forecasting value of#»F t+1 into equation 20, we have

p̂it+1 = θ̂i0 +

K∑j=1

θ̂ijFjt+1 (23)

The principle of this method is that, rather than treating the evolutionof stock price as a spontaneous and endogenous process, we think it ishighly correlated with the whole market. As it is impossible to regresseach stock on the whole set of other stocks, we extract a small numbers ofcommon stochastic trends which can largely represent the whole market.By the evolution of these trends, we capture more information whichwould influence the single stock’s behavior. Indeed, we will encountersimilar econometrics problem as we were doing single series autoregression.And it is hard for us to justify the valid of those assumptions. However,as long as this model could increase the forecastability, it is effective tosome extent.

3.4 Market Neutral Model

Stocks’ prices or returns are apparently influenced by market fundamen-tals. However, it is hard to build a model and take all possible factors intoaccount for explaining and forecasting fundamentals. Therefore, in thissection, we consider a statistical arbitrage model, in which the portfolio’sreturn is not impacted by market fundamentals. The common features ofstatistical arbitrage are (i) trading signals are systematic or rules-based,(ii) the trading portfolio is market-neutral, in the sense that it has zerobeta with the market, and (iii) the mechanism for generating excess re-turns is statistical. The idea is to make many bets with significant positiveexpected returns in the appropriate time, and produce a low-volatility in-vestment strategy which is uncorrelated with market.

Here we take reference of the paper by Avellaneda and Lee (2010) andbuild this model. First we form principal components of log-returns ofS&P500 stocks in a certain period. For example, if we are at time T andneed forecast the next period stocks’ returns, we use the past 60 days ofrecord, i.e, the historical window is chosen as 60 days. Following the same

10

principle in last section, we choose the most significant K eigenvectorsthat correspond to the biggest K eigenvalues. Define these vectors asVi, (i = 1 . . .K). Then we project log-return matrix (60 × N) on theseeigenvectors and form K market factors.

Ftj =

N∑i=1

Vji

σ̄iRti j = 1, . . . ,K; t = (T − 59), . . . , T (24)

Where Ftj is the jth market factor at time t. We should notice that thesemarket factors are dynamic because they would change as the historicalwindow moving forward.

Then we regress each stock’s log-returns on these market factors

Ri = mi +

K∑j=1

βijFj + R̃i i = 1, 2, . . . , N. (25)

Of course returns, principal components and the residuals are all station-ary, and we could assume E(R̃i) = 0. The proposed strategy is to look forthose regression residuals that have the most significant reverting process.Thus, we auto-regress each R̃i and find those residuals that have highestnegative autoregressive coefficient.

R̃it = ρiR̃it−1 + εit i = 1, 2, . . . , N. (26)

Figure 5 shows the top five mean-reverting residuals in the first historicalwindow.

Figure 5: The top 5 mean-reverting residuals in the first historical window

A trading portfolio which contains n stocks is said to be market-neutralif the dollar amounts {Qi}ni=1 invested in each stock in this portfolio aresatisfied:

β̄j =

n∑i=1

βijQi = 0, j = 1, 2, . . . , k. (27)

11

βij is the coefficients of stock i regress on factor j. In code, we use Nullspace to solve this linear system

Q = Null{β[K]×[n]} (28)

In order to guarantee a non-zero solution for the portfolio, we need choosen = K+1 stocks, which have the smallest K+1 autoregressive coefficients,as our portfolio member. Then we have

K+1∑i=1

QiRi =

K+1∑i=1

Qimi +

K+1∑i=1

Qi

[K∑

j=1

βijFj

]+

K+1∑i=1

QiR̃i

=

K+1∑i=1

Qimi +

K+1∑i=1

QiR̃i +

K∑j=1

[K+1∑i=1

βijQi

]Fj

=

K+1∑i=1

Qi(mi + R̃i)

(29)

In this equation, it is obviously to see that the portfolio return has nothingto do with market environment. And it is depend on the intrinsic factormi and a statistic random variable R̃i, which is mean zero and stationaryprocess satisfy mean-reversion.

The next step is to generate signals for entering trading. Loadingauto-regressing expression 26 into equation 29, we have:

K+1∑i=1

QiRit =

K+1∑i=1

Qi(mi + R̃it) =

K+1∑i=1

Qi(mi + ρiR̃it−1 + εit) (30)

Suppose we are at the last period T of historical window, from aboveequation, we expect the portfolio return at T+1 is

ET

(K+1∑i=1

QiRiT+1

)= ET

[K+1∑i=1

Qi(mi + ρiR̃iT + εiT+1)

]

=

K+1∑i=1

Qi(mi + ρiR̃iT )

(31)

When∑K+1

i=1 Qi(mi + ρiR̃iT ) is very high (positive), we could buy this

portfolio and expect to get a high return. While when∑K+1

i=1 Qi(mi +

ρiR̃iT ) is sufficiently negative, we could short this portfolio, and still ex-pect to get a high return. Thus, we could directly use

∑K+1i=1 Qi(mi +

ρiR̃iT ) as our trading signal, where ρi is negative coefficient. Define thesignal as ST . In our strategy, we set the trading entry criteria are:

1. if ST−mean{St} ≥ 0.7(max{St}−mean{St}), t = (T−59), . . . , TEnter trading, long the stocks whose Qi are positive by the amountof | Qi |, short the stocks whose Qi are negative by the amount of| Qi |. This would give expected return as |

∑K+1i=1 Qi(mi + ρiR̃iT ) |

2. if ST−mean{St} ≤ 0.7(min{St}−mean{St}), t = (T−59), . . . , TEnter trading, long the stocks whose Qi are negative by the amountof | Qi |, short the stocks whose Qi are posituve by the amount of| Qi |. This would give expected return as |

∑K+1i=1 Qi(mi + ρiR̃iT ) |

12

Finally, as historical window moving forward, we compare these expectedportfolio returns to the real portfolio returns and get a correlation result.

4 Comparison and Analysis

Table 2 shows the comparison results of these four different methods forforecasting stocks’ log-returns in S&P 500 universe. Here need clarifysome parameters. In the time series models (AR and VAR), we use thehistorical window across 1000 days. While in statistical arbitrage models,followed by Avellaneda and Lee (2010), we use past 60 days’ records asour information set for trading. ‘Common factors’ refer to the numberof other time series are used to forecast. In Pair Trading and MarketNeutral models, it means to the number of PCs that we used. As in thelast model, we use a signal to identify whether enter trading or not, wesee that the forecasting times is less than others.

Table 2: Comparison between four types of forecasting methodsLength of Common Forecasting Correlation

historical window factors used times with real returnsAR 1000 NA 1000 3.17%

Pair Trading 60 2 1000 13.89%VAR 1000 5 1000 4.52%

Market Neutral 60 15 503 17.2%

From the table we can see that both AR and VAR models exhibitlittle forecastability. In the AR model, we just investigate each stock’slog-returns. We know that individual log-price processes are almost arandom walk process, in the sense that log-returns, that is the differencesof log-prices, are almost white noise. Even though they are stationaryprocess, it is still hard to forecast their following behaviors. While inthe VAR model, we want to capture more market information that wouldimpact stock’ behavior. Thus we switch to look at how common marketfactors (PCs) evolute. Then by putting the forecasting values of PCs intothe original regressions, we get the predicted values for each stock. How-ever, we see that the effect on forecasting each individual’s log-returns istrivial. Therefore, it would not increase much opportunity to earn money.Moreover, when we extend our forecasting window to five days, we findthat the accuracy of these two methods decrease as the forecasting periodincrease (See Table 3). Overall, time series forecasting provides reason-able credit over short periods of time, but the accuracy of forecastingdiminishes sharply as the length of prediction increases.

Nonetheless, from Table 2, we find the second and last model improvea lot on the forecastility. The latent methodology in the second and lastmodels is mean-reversion, which is a mathematical concept sometimesused for stock investing. This concept suggest that prices and returnseventually move back towards the mean or average. Revisiting the equa-tion 9 in Pair Trading model, we see that the pair of stocks’ log-prices

13

Table 3: Time series methods to forecast different periodsLength of Days to Correlation

historical window forecast with real returnsAR 1000 1 3.17%VAR 1000 1 4.52%AR 1000 5 1.38%VAR 1000 5 1.90%

are cointegrated and the residuals after regression are supposed to movearound the average. By mean-reversion, we expect dXt have a negativecorrelation with Xt. This is not only a property that we infer or extractfrom data, but also supported by a theoretical model, i.e, OrnsteinUh-lenbeck (O-U) process. In mathematics, the O-U process (see Gardiner(1985)), is a stochastic process that describes the velocity of a massiveBrownian particle under the influence of friction. The process is sta-tionary, Gaussian, and Markovian. Over time, the process tends to drifttowards its long-term mean: such a process is called mean-reverting. Moreover, another important and widely used assumption in Finance is thatstock prices’ stochastical movement follows geometric brownian motion.Thus, for the Xt in equation 9, we could apply O-U process and get:

dXt = κ(m−Xt)dt+ σ · dWt, κ > 0 (32)

where m is the mean of Xt, dWt is the increment of brownian motion(Wt ∼ N (0, t)), σ measures the volatility of movement, and the parame-ter κ is called the speed of mean-reversion. This process is stationary andauto-regressive with lag 1. In particular, the increment dXt has uncondi-tional mean zero and conditional mean equal to

E{dXt|Xs, s ≤ t} = κ(m−Xt)dt

When Xt > m, we expect dXt be negative, and Xt < m implies a positivedXt. A small transformation to equation 32 , we get:

dXt = κm · dt− κdt ·Xt + σ · dWt (33)

Compare it with equation 9 in Pair Trading model, we find that they havethe same form, and

β0 = κm · dt, β1 − 1 = −κdt, εt = σdWt (34)

This on the other hand endorses AR(1) model which we used for theprocess Xt. And finding the most negative coefficient β1 is equivalent tofinding the process which has the highest speed of mean-reverting.

For the last model, i.e., Market Neutral model, we used another methodto identify mean-reverting process. In stead of studying cointegratedlog-prices, we directly regress log-returns which are already stationaryprocess on the common market factors (PCs). The residuals after re-gression(include constant item) are mean zero. But there is no rigorousmodel to support that the residual series are mean-reverting around zero.

14

The relationship in equation 26, i.e., R̃it = ρiR̃it−1 + εit, where ρi < 0is basically an assumption. However, we looked for all the stocks, andfound those who are most possible to obey this relationship (See Fig-ure 5). Therefore, for the stocks we have chosen, the residuals R̃it afterregression on the common market factors, are reasonable to assume oscil-lating near zero. Then we could effectively apply mean-reversion method.Nevertheless, we need pay attention that not all stationary processes aremean-reverting, or can be used for mean-reversion. Moreover, if a ran-dom walk I(1) process have mean zero, the probability for it crosses zerois one, but the mean time to crossing zero is infinite. Thus, we couldneither apply mean-reversion to a random walk process in a direct way.

The reason for a relatively good performance of the second and lastmodel is that, in stead of focusing on forecasting variables themselves, wepay attention on the residuals. Either by the existing theories or econo-metrics analysis, we extract more information on the property of residuals,which exhibit more forecastability. Just as the famous saying in Finance:“Profit comes from residuals”. The other learning from our research isthat, there is not a ‘real’ model explaining stocks’ prices or returns inFinance. All the existing theory are partially right, and all the model areonly valid when the assumptions are reasonable. For example, the funda-mental assumption for the O-U process or the famous Black-Scholes modelis that the underling stock price St follows geometric brownian motion

St+1 − St = (r − q)Stdt+ σStdWt

=⇒St+1

St= 1 + (r − q)dt+ σ

√dt · Z, Z ∼ N (0, 1)

(35)

which suggests that log-prices is a self auto-regressive process and notimpacted by others, however, we already found this is not proper mosttime. There are too many variables and factors which could influence thestock markets. Even one model can works well for a time, once manypeople begin to use it, people’s trading and investment behavior wouldconversely impact the market and may offset the utility of that model.Hence, other than some Economics problem, Finance market are almostfull of noise (Black (1986)) and hard to model. The job is to find a littlebit useful information in the enormous environment, catch opportunityand make money.

5 Conclusion

Practical experiments and back testing results illustrate that the tradi-tional time series methods don’t work well. The models AR and VARwhich belong to univariate and multivariate time series analysis respec-tively can only have less than 5 percent accuracy. When the forecastingperiod increases, the accuracy decreases significantly. This suggests that itis hard to derive a true recurrence relation that can be used to predict newvalues. However, Pair Trading and Market Neutral models which basedon statistical arbitrage principle improve the forecastability to more than10 percent. The idea is to form a pair or a portfolio whose returns only de-

15

pend on the values of residuals, and further by excavating mean-reversionproperty of these residuals, we gain more forecastability.

References

(2012). Standard & Poor’s 500 index - S&P 500. Investopedia.

(2013). S&P Indice Methodology. Standard And Poor’s.

Avellaneda, M. and J.-H. Lee (2010). Statistical arbitrage in the us equi-ties market. Quantitative Finance 10 (7), 761–782.

Black, F. (1986). Noise. The journal of finance 41 (3), 529–543.

Box, G. E., G. M. Jenkins, and G. C. Reinsel (2013). Time series analysis:forecasting and control. John Wiley & Sons.

Gardiner, C. (1985). Stochastic methods. Springer-Verlag, Berlin–Heidelberg–New York–Tokyo.

Hamilton, J. D. (1994). Time series analysis, Volume 2. Princeton uni-versity press Princeton.

Hannan, E. J. and B. G. Quinn (1979). The determination of the orderof an autoregression. Journal of the Royal Statistical Society. Series B(Methodological), 190–195.

Hassan, M. R. and B. Nath (2005). Stock market forecasting using hiddenmarkov model: a new approach. In Intelligent Systems Design andApplications, 2005. ISDA’05. Proceedings. 5th International Conferenceon, pp. 192–196. IEEE.

Hassan, M. R., B. Nath, and M. Kirley (2007). A fusion model of hmm,ann and ga for stock market forecasting. Expert Systems with Applica-tions 33 (1), 171–180.

Hirsa, A. (2012). Computational methods in finance. CRC Press.

Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.

Lawrence, R. (1997). Using neural networks to forecast stock marketprices. University of Manitoba.

Lo, A. W. and A. C. MacKinlay (1990). When are contrarian profitsdue to stock market overreaction? Review of Financial studies 3 (2),175–205.

Miller, M. H., J. Muthuswamy, and R. E. Whaley (1994). Mean reversionof standard & poor’s 500 index basis changes: Arbitrage-induced orstatistical illusion? The Journal of Finance 49 (2), 479–513.

Pole, A. (2008). Statistical arbitrage: algorithmic trading insights andtechniques, Volume 411. John Wiley & Sons.

16

Stock, J. H. and M. W. Watson (1988a). Testing for common trends.Journal of the American statistical Association 83 (404), 1097–1107.

Stock, J. H. and M. W. Watson (1988b). Variable trends in economic timeseries. The Journal of Economic Perspectives, 147–174.

17

heheh

Documents

forecasting stocks

stocks market

var model methods

pair of stocks prices

stocks activity

market neurtral model

corresponding methods

stocks behav ior