equity trading with an ensemble neural network system
Post on 26-Oct-2014
115 Views
Preview:
TRANSCRIPT
Equity Trading with an Ensemble Neural Network System
Combining factor selection by non-parametric statistics with
evolutionary algorithms and artificial intelligence
Emil Tingström
April 16, 2012
Abstract
This paper relates to the use of mathematical models in order to produce excess return, or
alpha, when trading equity derivatives. With the advent of computers speculators and
investors have been trying to harness the computation power to create more efficient and
profitable portfolio allocation strategies. Computers provide the option to test complex and
calculation-intensive trading methods that are objective in contrast to the usual investment
decision made by humans which is subjective.
Several possibly predictive factors from three categories are tested and analyzed using a
non-parametric hypothesis test call Bootstrap resampling as well as a parametric one call
Student’s t-test. From the distribution of daily returns of each factor tested the p-value can
be derived which is the probability that a mean return as extreme as the one observed could
be the result of random chance alone. The resulting factors that prove to be statistically
significant during an eight-year period predicting the exchange-traded fund XACT OMXS30
were implemented as inputs to an ensemble of neural networks with weights and thresholds
evolved by a genetic algorithm to achieve optimal positive p-value in the in-sample period.
The ensemble was then tested on an out-of-sample period and this showed highly
significance positive returns.
1
Förord
Detta projektarbete har gjorts inom inriktningen matematik och datavetenskap på
Naturvetenskapsprogrammet, Donnergymnasiet. Arbetet följer målen som är uppsatta för
programmet genom att med ett vetenskapligt förhållningssätt applicera teoretiska modeller
i ett försök att beskriva verkligheten på ett matematiskt sätt. Genom experimentella studier
förfinas modellerna och resultatet tolkas utifrån objektivt synsätt.
Jag vill rikta tack till mina handledare Leif Duveborg och Per Stumle, min engelsklärare Meg
Johansson samt mina programmeringslärare Joakim Wassberg och Johan Sköldh.
Klintehamn den 16 April
Emil Tingström
2
Contents 1. Introduction........................................................................................................................3
1.1 Speculation ...................................................................................................................3
1.2 Backtesting and the scientific method ...........................................................................3
1.2.1 Data mining and data snooping ..............................................................................4
1.2.2 De-trending data ....................................................................................................4
1.2.3 Student’s t-test .......................................................................................................5
1.2.4 Bootstrap method ..................................................................................................5
1.3 Data and software .........................................................................................................6
2. Momentum or Mean Reversion ..........................................................................................6
2.1 Return ...........................................................................................................................6
2.2 Close in relation to the range ......................................................................................14
3. Seasonal Tendencies .........................................................................................................15
3.1 By day of the week ......................................................................................................15
3.2 By day of the month ....................................................................................................16
4. Intermarket analysis .........................................................................................................18
4.1 Sector analysis.............................................................................................................19
4.2 Sector spread analysis .................................................................................................23
5. Combining the Predictive Factors into a Learning Ensemble System .................................34
5.1 Ensemble Trading Model .............................................................................................34
5.1.1 Artificial Neural Networks .....................................................................................34
5.1.2 Genetic Algorithms ...............................................................................................35
5.1.3 Ensemble learning ................................................................................................36
5.2 Method .......................................................................................................................37
5.3 Results ........................................................................................................................39
5.3.1 In-sample performance.........................................................................................40
5.3.2 Out-of-sample performance .................................................................................43
6. Conclusion ........................................................................................................................45
7. References ........................................................................................................................46
Appendix A – Excel Testing ...................................................................................................48
Appendix B – C# code for implementing the neural networks...............................................49
Appendix C – Bootstrap code ................................................................................................56
3
1. Introduction
The following part serves as an introduction to the use of statistics to analyze price
movements in the equity market.
1.1 Speculation
Speculation has always intrigued humans as it offers the opportunity to make a profit solely
by buying at a lower price that you sell at. One of the most common ways to speculate is
through the stock market where shares representing ownership in different companies are
traded. The movements of these shares are based on the different price agreements buyer
and seller make at the stock exchange. Predictions of the movements have been subject to a
lot of work by many people throughout the centuries. Many have been successful in their
endower, but far more have lost as their edge over the other market participants might only
have been an illusion or might have disappeared because of exploitation. The entire
economic system is a highly interconnected, complex and dynamic entity and the magnitude
of factors that affects it are vast. The price movement of a single stock reflects not just the
current news about the company but also the aggregate news of every related security. If
the market would be truly efficient, the news would be synthesized into the price
immediately without any lag. This however would assume that all market participants are
always rational as well as always fast enough to react to every change in the known
information. Only by the use of historical analysis can we know if that is the case.
1.2 Backtesting and the scientific method
Backtesting is the process of evaluating a strategy, theory, or model by applying it to
historical data (Wikipedia, 2011). Prediction is decision-making based on a view that is
assumed to hold true in the future and therefore the more confidence one has in the view
the more certain one can be in the prediction. Karl Popper argues that any theory that is to
be considered scientific (and hence true) must be falsifiable (Thornton, 1997). Only if the
theory can be disproved by objective tests can it hold any value. David Aronson takes this
view further and concludes that there are two views that people base their investment
decisions on; one subjective based on gut feeling and one objective based on verifiable rules.
The subjective view is, according to Aronson, always inferior to the objective view as while
the objective view might be proven false, the subjective view can never be proven at all thus
rendering it absent of any useful meaning (Bukey, 2007). The proper way for which
investment decisions should be made is therefore by following an objective theory that is
verified by statistical tests. However, the establishment of such a theory poses many issues.
This paper examines what predictive factors might exist in the Swedish stock market with
non-parametric statistical methods. The resulting factors are then tested as inputs into a
trading system based on the aggregated output of a large number of artificial neural
4
networks optimized by simulated evolution. The results are then evaluated for both the
training period and the validation period to see if the ensemble can generate significant
outperformance.
This paper is structured as follows: Chapter 2 examines the effect that price return has on
future returns, Chapter 3 looks at anomalies in the returns distribution on specific dates and
Chapter 4 evaluates the effect the price return of sector indices has on the instrument
tested. Chapter 5 then uses the resulting factors that have proven to be statistically
significant as inputs into an ensemble of neural network and examines the result.
1.2.1 Data mining and data snooping
When dealing with historical data many issues need to be considered in order to secure the
validity of the test. Stock market returns are by their nature very noisy and contain a large
amount of randomness. Whenever a test of a rule is conducted there is a chance that the
results might be purely due to chance alone. This chance grows larger when data mining for
profitable rules to test. Data mining is the process of discovering new patterns from data
sets by “mining” from them using a wide range of possible methods. The more rules that are
tested, the more the final or the best rule runs the risk of data snooping. Data snooping (or
data dredging, data fishing) is the inappropriate use of data mining to uncover misleading
relationships in data. Data snooping is most likely to occur when the data sample is too
small, leaving the rules tested without the ability to make robust generalizations out-of-
sample. A small sample is especially susceptible to being data snooped when the function
used to describe it contains a large number of variables. The more variables in a function (or
rule, strategy) the bigger the parameter space is and hence; the number of combinations
that could be tested and found profitable.
Another issue when backtesting strategy across several stocks is that the sample might not
fully represent all stocks available at the time. For example, consider a backtest performed
on the stocks that make up a stock index today. Since it is likely that the members that
constitute the index will have changed over time as companies rise and fall, that backtest
will be biased to have a positive return as the companies dropped out of the list will be those
that performed the worst in the past. This is called a survivorship bias and needs to be
accounted for in order to have a realistic simulation. (Hassler, 2011)
1.2.2 De-trending data
Historical trends in stock prices will influence any test done by shifting the expected mean
for a strategy that trades randomly away from zero. In order to neutralize this bias the
average daily return is subtracted from the return of each day in the test period. This creates
a detrended data sample with a mean arithmetic return of zero on which tests can be
conducted.
5
1.2.3 Student’s t-test
In order to determine with any confidence that the return earn by a rule in a backtest a
hypothesis test is made. One common test made is the t-test. When testing the null
hypothesis that the sample’s mean is equal to a specified value , one uses the statistic
√
where is the sample mean, s is the sample standard deviation of the sample and n is the
sample size.
The sample standard deviation is defined as
√
∑( )
where * + is a vector the observed values in the sample.
Once a t value is determined, a p-value can be found, given the degrees of freedom – ,
using a table of values from Student's t-distribution. The p-value represents the probability
that the observed test statistics is the result of chance alone. Typically the threshold for
statistical significance is set at 0.05 or 0.01, thus rejecting the null hypothesis in favor of the
alternative hypothesis.
1.2.4 Bootstrap method
The stock market is a highly complex and dynamic system and even if it has been assumed to
follow a Gaussian or “normal” distribution, Mandelbrot suggested that the distribution of
stock market returns is far more prone to “fat tails” (outliner such as stock market crashes).
(Kaplan, 2004) This might make a parametric hypothesis test such as Students t-test show
inaccurate results since the Students t-distribution is derived from a normal distribution. A
more robust test could then be a non-parametric test such as the Bootstrap method in which
the assumed distribution is derived from the actual test statistics. (Aronson, 2006)
The Bootstrap procedure goes like this:
Compute the mean
∑
of our return vector { } where is the number of days derived from the backtest and
subtract this from every value in it, thus giving it a mean of zero, a processes known as zero-
centering (not to be confused with de-trending which we have already done). From this new
6
vector we create resamples by drawing at random values with replacements. To
compute the p-value, we calculate the probability that the mean of any resample is
greater than or equal to , (
)
By using the actual sample to create a distribution from which we can make comparisons,
we only make the assumption that the sample is a randomly drawn sample from a larger
population that is independent and independently distributed. Since each resample is drawn
by random chance, the p-value will be deterministic only when an infinite number of
resamples are drawn. For the sake of accuracy, has to be very large but still achievable
within a given timeframe. For the test conducted in this paper was set to 500,000
resamples.
1.3 Data and software
All historical prices for Swedish stocks, ETFs and indices were obtained from NASDAQ OMX
Nordic’s website. Whenever there was a missing value for any specific day that day was
excluded from the backtest. By using the data directly from the exchange it can be assumed
to be more historically accurate than from other data vendors. The tests were performed in
Microsoft Excel 2010 and the Bootstrap test for statistical significance, the genetic algorithm
and the neural networks were all done in Microsoft Visual C# 2010 Express.
2. Momentum or Mean Reversion
2.1 Return
The simplest choice when considering the factors that might impact future price movements
would be past returns, as that probably is what most investors will examine before making
any investment decision. There have been numerous academic researches that indicate that
momentum, or the tendency for the price to correlate positively with its past return, is a
significant effect in stock markets all over the world. This is not just limited to the stock
market but also to commodities markets, fixed income markets and currency markets. The
opposite; mean reversion i.e. the tendency for prices to revert to their means has also been
observed in some markets and time frames.
In order to observe the effects of past return on future return, we first have to quantify past
return in an easily manageable form. To start off, the specific time period during which the
return will be measured needs to be specified. A lot of research has been focused on longer
time periods (months and years) to observe momentum within economic cycles; however
this will introduce the problem of data snooping due to a lack of data. Testing a strategy that
trades once a year on ten years worth of data will probably fail to met any basic
requirements of statistical significance. This is often countered by performing the test across
a vide variety of markets, but as the tests in this paper are only conducted on one market
the time period specified will have to be short to get enough trades to analyze. For all future
7
purposes, the time period measured will be all five intervals between one and five days. The
formula for this will be:
( )
∑
where is the closing price today and is the closing price N days ago where .
( ) is then normalized to shifting levels of volatility and trend biases by ranking the ( )
today by the same values for the past 252 days (roughly corresponding to one year in trading
days). The percentile rank can be expressed by the formula:
( )
Where is the count of all past ( )s less than the ( ) of interest, is the frequency of
( ) and is the number of examinees in the sample (here 252). By ranking today’s return
to past return you get the relative standing of that value as a percentage, which will be the
standard method for expressing a factor in this paper from this point forward. The
advantage of doing this is that the analysis of the factor is less likely to be influenced by
difference in volatility and trend in the test sample and therefore reducing the danger of
data snooping. The percentile rank could also be inferred from the z-score1, but as I
discussed earlier the data sample might not conform to a normal distribution and thus the
non-parametric method is preferable. Another advantage is that the factors are more easily
combined, which will serve useful in later chapters.
The tests will be conducted on XACT OMXS30, which is a tradable proxy for OMXS30 GI
which in turn represents the thirty most traded stocks at the Stockholm Exchange (Fonder).
XACT OMXS30 is a so called Exchange Traded Fund, or ETF, that can be traded just like any
stock at NASDAQ OMX. The accurate price relative to the underlying of the fund (NAV, or
Net Asset Value) is guaranteed by independent market makers. The cost involved when
trading it would be the management fee at 0.3% and standard trading commission at the
broker. Since I am testing on an index, survivorship bias is not an issue and since the rules I
am using for testing are simple and generalized the risk of data snooping should be low.
1 The standard score, or z-score is simply the number of standard deviations an observation is above or below the mean. This value could then be checked against a normal distribution to get the percentile rank since distribution and the actual distribution are assumed to be the same.
8
Figure 2.1.1 Chart of XACT OMXS30’s price adjusted for dividends during the testing period.
The backtests are first conducted on the price history for XACT OMXS30 from the 21st of
January 2004 to the 17st of February 2012. Since XACT OMXS30 gives its owners dividends
every year which is not included in the price history provided by NASDAX OMX Nordic’s data
feeder, a synthetic price history was created to account for the lack of dividends. Using the
history at XACT’s webpage (Xact, 2012), I simply added the dividend to the prices starting
from the ex-dividend day. This way the dividend is included in the returns as they would
have been in reality, with the only difference that they were paid the same day the NAV was
adjusted for it. At max 257 days are required to calculate the ranked return value. XACT
OMXS30 has a history going back to October 2000, but due to inadequate data with missing
values and low trading volume for the ETF only the history starting from January 2002 will be
used in this paper. The test starts at the 21st of January 2004 to coincide with the start of
the later tests conducted in this paper (which required more history to start). The means are
calculated on detrended data to offset any bias due to trends in the sample and based on
buying at the closing price when the ranked return value was below 0.5 as well as when it
was above 0.5 for that day and selling at the close of the next. This means that there would
have been some slippage in reality as the trades are taken at the same price that the signal is
generated from. And as there is no closing call auction2 for ETFs as there is with stocks bid
2 A closing call auction is the time of the day that the final price is determined by grouping together all the outstanding orders to find out at what price the maximum number of shares can be exchanged. The reason for doing this is usually to get a “fair” price from which the values of the index or derivatives can be calculated.
9
and ask spreads are a concern. Also, trading commission will have had a negative impact on
the returns calculated. These issues will not be address in the tests as their impact depends
on factors that cannot be modeled accurately without massive resources and full knowledge
of how the trading is conducted.
In the following table the average arithmetic daily return is show for look back-periods from
one to five as well as their respective significances arrived from the Bootstrap resampling
method.
N Mean P-value
1 0.0690% 0.0494
2 0.0611% 0.0786
3 0.0612% 0.0761
4 0.0790% 0.0314
5 0.0700% 0.0543
Table 2.1.1 Bootstrapped significance for ( )
All the rules tested generated an average daily return above the expected by 0.06% and with
a weak to strong statistical significance indicating that the probability that it is all a fluke is
moderate.
Given that values below 0.5 show positive excess return, values above should be expected to
show negative excess returns.
N Mean P-value
1 -0.0705% 0.0337
2 -0.0637% 0.0415
3 -0.0641% 0.0433
4 -0.0818% 0.0150
5 -0.0734% 0.0205
Table 2.1.2 Bootstrapped significance for ( )
For all tested the statistical significance is below a 0.05 threshold which indicate that the
null-hypothesis is refuted and that the alternative hypothesis should be accepted. Since the
mean is so far to the left side of the distribution (the negative side) the Bootstrap test
calculates the p-value based on the probability ( ) , i.e. the likelihood that any
10
random distribution drawn by resampling with replacements has a mean lower than or equal
to the observed.
Below are the equity curves of non-compounding3 portfolios trading the percentile ranked
return in the sample period on the detrended price history.
Figure 2.1.1 Non-compounding portfolios trading when ( ) and
respectively with and linear regressions.
Figure 2.1.2 Non-compounding portfolios trading when ( ) and
respectively with and linear regressions.
3 Since I am going to do a linear regression to measure the consistency of returns, the equity cannot compound
i.e. the returns will not be reinvested.
11
Figure 2.1.3 Non-compounding portfolios trading when ( ) and
respectively with and linear regressions.
Figure 2.1.4 Non-compounding portfolios trading when ( ) and
respectively with and linear regressions.
Figure 2.1.5 Non-compounding portfolios trading when ( ) and
respectively with and linear regressions.
As evidenced by the backtests, the Swedish stock market has exhibited significant mean
reversion tendencies in the past decade in the short term. This is in line with what Stokes
12
(Stokes, 2009) has found testing the short term behavior of large cap equity indices in the
United States. Stokes also notes that this contrarian behavior do not extend to longer look-
back periods as the effect seems to be limited to the twenty-first century. Historically, stock
markets have had a tendency to “trend” in the short term but this momentum effect faded
away, possibly with the advent of computerized trading.
Distinguishing the oversold levels from the overbought using the median leaves the question
whether more extreme levels might show greater returns and significance. For example, it is
unlikely that the exact distinction between when to expect negative returns and when to
expect positive return lies exactly at a ( ) of 0.5. It makes much more sense that the
expected return for the next day is negatively correlated with the value, i.e. that the higher
the value the stronger negative return can be expected. In order to test whether this holds
true and whether this negative correlation is strong enough to justify additional complexities
regarding threshold values, I divide the dataset into ten deciles. A decile is simply a point
taken at regular intervals in the distribution, more specifically for each ten percent. Using
the existing values already percent-ranked for the past 252 days, I calculate the mean return
for days after for each of these ten bins. I then perform a linearly regression against the
means to see how well they correlate to the deciles.
Figure 2.1.6 Mean daily return per decile for .
13
Figure 2.1.7 Mean daily return per decile for
Figure 2.1.8 Mean daily return per decile for
Figure 2.1.9 Mean daily return per decile for
14
Figure 2.1.10 Mean daily return per decile for
Interesting to note is that the R2-value, which is the correlation between the regression line
and the data points squared, seems to increase with . There is a negative correlation for
each N tested which confirms that past returns do drive future returns, but as evidenced by
the low R2 this relationship is disturbed by a lot of noise as one would expect considering the
efficiency and multidimensionality of the market.
2.2 Close in relation to the range
Another way to measure the distance the price has traveled would be by considering the
closing price in relation to the high and the low for the same day. The formula for expressing
the relative close goes like this;
( )
∑
In effect, the difference between the closing price and the lowest price for the day is
scaled by the difference between the highest price and the low. As there are days,
especially early in the sample, when the high and low is the same price those days were
adjusted to 0.5 to avoid errors. The average of days is then taken and ranked as a
percentage in relation to the values for the past 252 days, ( ). I then test the factor
on the price history for XACT OMXS30 from the 21st of January 2004 to the 17th of February
2012 and compute the p-value with a Bootstrap test.
N Mean P-value
1 0.0535% 0.0964
2 0.0363% 0.1886
15
3 0.0814% 0.0294
4 0.0357% 0.1973
5 0.0207% 0.3124
Table 2.2.1 Bootstrapped significance for ( )
N Mean P-value
1 -0.0531% 0.0897
2 -0.0394% 0.1574
3 -0.0819% 0.0143
4 -0.0353% 0.1809
5 -0.0200% 0.3028
Table 2.2.2 Bootstrapped significance for ( )
As expected by the mean reversion tendencies confirmed in the previous subsection the
relative close value is negatively correlated to the next day’s return. This effect is however
not statistically significance other than for an N-day period of 3. Nevertheless, the tests show
that the mean reversion effect expands to the close in relationship to the range as well.
3. Seasonal Tendencies This chapter examines the impact that the specific calendar date has on returns. Just like the
demand and availability of an item changes throughout the year, the behavior of investors
could also be changed.
3.1 By day of the week
A simple test of seasonal tendencies could be done by testing the effect the day of the week
has on the return of that day. The possibility exists that there might be signs of excessive
buying or selling depending on which day in the week it is. I tested this by first calculating
the mean daily return for each day of the week and then ran a Bootstrap test to check for
statistical significance on the detrended price history for XACT OMXS30 from the 21st of
January 2004 to the 17th of February 2012.
Weekday Mean P-value
Mon -0.0067% 0.4620
16
Tue -0.0381% 0.2706
Wed 0.0765% 0.1117
Thu -0.0304% 0.3150
Fri -0.0017% 0.4892
Table 3.1.1 Bootstrapped significance for returns per weekday
As indicated by the test results, no day of the week way showed returns that would not be
expected to occur as a result of random chance.
3.2 By day of the month
When examining the historical returns for U.S. equities around the turn of the month,
McConnell and Xu found that there has been a strong tendency for positive return around
the turn of each month (McConnell, 2006). More precisely, from the last day to the third day
of each month returns have on average been so strong that an investor would receive no
reward for having exposure during the other trading days of the month. To test this on
Swedish equities, I first divide the trading days in the dataset according to their distance
from the turn of the month. For example, would be the last day of the month and
is the first day of the month. Since the number of trading days in each month will
change from month to month, only the last and the first eight days will be examined. This it
the maximum amount of days I can include in the test by distance from the turn of the
month, without excluding certain months due to lack of trading days (the month that had
the least days had seventeen). I then calculate the mean return on the detrended price
history for XACT OMXS30 and use the distribution of each to get the Bootstrapped
statistical significance.
t Mean P-value
-8 -0.1313% 0.1908
-7 0.0349% 0.3951
-6 -0.1128% 0.1968
-5 0.0398% 0.3712
-4 0.0384% 0.3636
-3 0.2046% 0.0367
-2 0.1064% 0.2129
-1 0.0523% 0.2911
17
1 0.3139% 0.0151
2 0.0644% 0.3210
3 0.0014% 0.4965
4 -0.2186% 0.0660
5 -0.0324% 0.3948
6 -0.0521% 0.3716
7 -0.1437% 0.1333
8 -0.0429% 0.3635
Table 3.2.1 Bootstrapped significance for returns at the turn of the month
The test confirms McConnell and Xu’s finding that there is a positive bias around the turn of
each month. Although the positive bias began five days before and stretched three days
after the turn of the month, the effect was only statistically significant at and .
At the effect turned to the opposite and there was a statistically significant negative
mean daily return.
The following graphs is the equity curve of two portfolios trading the detrended XACT
OMXS30 price based on the day of the month.
Figure 3.1.2 Non-compounding portfolios trading when and respectively with
linear regressions.
This effect is more easily illustrated by a bar chart. The positive blue bars are centered in
uniform around the sign change of , with every other day in red.
18
Figure 3.2.1 Average daily return at the turn of the month
In conclusion, the turn of the month effect is very much alive in the Swedish market. The
majority of the profit is centered around three days and one day after the switch and at four
days after some of the profit is taken back by the market.
4. Intermarket analysis Investors and portfolio managers might not just trade one instrument at a time but several
simultaneously as a way to diversify their returns and risks. At the same time, the
profitability of one security or stock might be heavily correlated to others as they might
share similar economic factors. The intricate net of factors that have an impact on share
prices makes intermarket analysis a relevant field of study for predicting the former. Given
that there might be a lag in the accurate pricing between stocks that correlate, by trading of
19
the discrepancy alpha could be generated as a result. In order to keep the number of factors
that will be considered in this paper low, only sectors of OMXSPI will be analyzed.
4.1 Sector analysis
The following sector indices will be analysis and backtested as part of an intermarket
strategy:
OMX Stockholm Financials PI
OMX Stockholm Health Care PI
OMX Stockholm Industrials PI
OMX Stockholm Consumer Services PI
OMX Stockholm Consumer Goods PI
OMX Stockholm Utilities PI
OMX Stockholm Basic Materials PI
OMX Stockholm Technology PI
OMX Stockholm Telecommunications PI
There are actually several more sectors that are represented as an index at NASDAQ OMX,
but due to lack of recorded price history those were left out when testing. Notice how all of
these are Price Index (PI) in contrast to the index that XACT OMXS30 is trying to replicate,
which is a Gross Index (GI). While Price Indices only track the price of stocks, Gross Indices
also track the returns earned on dividends and such.
To test the impact the returns of each sector has on the overall stock market, I perform the
same calculation previously made when testing for mean reversion tendencies.
( )
∑
where is the closing price today and is the closing price N days ago where
.The average daily return for the past days is then ranked as a percentile by the same
values for the past 252 days and then used to analyze the returns of the ETF XACT OMXS30.
Sector 1 2 3 4 5
Financials PI 0.0228% 0.0571% 0.0702% 0.0476% 0.0446%
Health Care PI 0.0120% 0.0405% 0.0204% 0.0019% 0.0178%
Industrials PI 0.0245% 0.0194% 0.0476% 0.0362% 0.0343%
Consumer Services PI -0.0183% 0.0347% 0.0516% 0.0496% 0.0341%
Consumer Goods PI 0.0183% 0.0394% 0.0540% 0.0160% 0.0299%
Utilities PI 0.0397% 0.0115% 0.0605% 0.0076% 0.0220%
Basic Materials PI 0.0429% 0.0573% 0.0636% 0.0504% 0.0433%
Technology PI -0.0123% 0.0214% 0.0068% 0.0140% 0.0263%
Telecommunications PI 0.0487% 0.0662% 0.0747% 0.0685% 0.0824%
20
Table 4.1.1 Average daily returns for ( )
Due to computational constraints when performing Bootstrap permutations, I instead use
the t-test to test the hypothesis that the return earned by the rule is not significantly
different from zero. To test both negative and positive means, the following formula is used.
| |
√
is set to 0 since the data the tests are performed on is already detrended.
Sector 1 2 3 4 5
Financials PI 0.2956 0.0922 0.0496 0.1269 0.1459
Health Care PI 0.3881 0.1673 0.3116 0.4818 0.3371
Industrials PI 0.2814 0.3267 0.1406 0.2003 0.2166
Consumer Services PI 0.3279 0.2070 0.1135 0.1256 0.2157
Consumer Goods PI 0.3314 0.1776 0.0995 0.3544 0.2422
Utilities PI 0.1514 0.3908 0.0682 0.4281 0.3034
Basic Materials PI 0.1618 0.0974 0.0750 0.1234 0.1635
Technology PI 0.3827 0.3074 0.4352 0.3692 0.2596
Telecommunications PI 0.1282 0.0645 0.0394 0.0523 0.0270
Table 4.1.2 P-values from Student’s t-test for ( )
Highlighted in green are the p-values that fall below a 0.10 threshold. All sectors but four fall
below that in at least one setting. It seems that the mean reversion effect previously
explored is also present in this analysis. Each sector can be assumed to correlate well with
OMXS30 since some of the components of the sectors are also components of the major
index, so this should not stand out from what could be expected. However, the strong and
significant performance of the Telecommunications sector index indicates that the major
part of the effect from mean reversion in OMXS30 comes from one sector. With an average
p-value of 0.0623 across all tested values of the performance of Telecommunications PI is
unlikely to be the result of data mining.
Sector 1 2 3 4 5
Financials PI -0.0227% -0.0595% -0.0728% -0.0503% -0.0484%
Health Care PI -0.0120% -0.0415% -0.0208% -0.0019% -0.0181%
Industrials PI -0.0252% -0.0198% -0.0478% -0.0362% -0.0357%
Consumer Services PI 0.0185% -0.0353% -0.0528% -0.0512% -0.0353%
Consumer Goods PI -0.0181% -0.0404% -0.0570% -0.0163% -0.0314%
Utilities PI -0.0486% -0.0116% -0.0631% -0.0080% -0.0231%
21
Basic Materials PI -0.0438% -0.0580% -0.0636% -0.0496% -0.0426%
Technology PI 0.0128% -0.0218% -0.0073% -0.0149% -0.0281%
Telecommunications PI -0.0487% -0.0697% -0.0764% -0.0707% -0.0850% Table 4.1.3 Average daily returns for ( )
Sector 1 2 3 4 5
Financials PI 0,2759 0,0542 0,0265 0,0958 0,0995
Health Care PI 0,3773 0,1405 0,2974 0,4799 0,3180
Industrials PI 0,2532 0,2965 0,0918 0,1664 0,1618
Consumer Services PI 0,3198 0,1755 0,0801 0,0825 0,1685
Consumer Goods PI 0,3199 0,1425 0,0677 0,3318 0,2009
Utilities PI 0,1250 0,3832 0,0570 0,4177 0,2684
Basic Materials PI 0,1162 0,0529 0,0386 0,0897 0,1190
Technology PI 0,3715 0,2827 0,4249 0,3491 0,2392
Telecommunications PI 0,0973 0,0270 0,0216 0,0317 0,0115 Table 4.1.4 P-values from Student’s t-test for ( )
Highlighted in red are the p-values that fall below a 0.10 threshold. Both
Telecommunications PI and Basic Materials PI had an average p-value below 0.10 with
0.0378 and 0.0833 respectively.
Below the equity curves of non-compounding portfolios trading the Telecommunications PI
factor is displayed. The returns are detrended.
Figure 4.1.1 Non-compounding portfolios trading when
( ) and respectively with and linear
regression.
22
Figure 4.1.2 Non-compounding portfolios trading when
( ) and respectively with and linear
regression.
Figure 4.1.3 Non-compounding portfolios trading when
( ) and respectively with and linear
regression.
Figure 4.1.4 Non-compounding portfolios trading when
( ) and respectively with and linear
regression.
23
Figure 4.1.5 Non-compounding portfolios trading when
( ) and respectively with and linear
regression.
4.2 Sector spread analysis
After examining the impact of individual sectors on XACT OMXS30, I will test the impact that
the relative performance between two sectors as a different measurement of market
sentiment. The theory behind this is that some sectors might be correlated to the general
sentiment among investors, and if so outperformance relative to sectors that correlate to
the opposite will be a positive sign of future price return.
I first calculate the average daily return for each sector:
( )
∑
( )
∑
Both values are then percentile ranked by the distribution of values for the past 252 days
and then the difference is taken.
( ) ( ) ( )
This difference is then again ranked by the values for the past 252 days to account for any
trend and volatility bias that might exist between the normalized returns from the sectors.
The final value ( ) is then calculated.
24
Since the number of combinations will be too ample to run through with Bootstrap
permutations, the t-test (with absolute t) will be used to calculate the statistical significance.
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,21
0,09
0,26
0,48
0,34
0,32
0,48
0,20
0,05
Health Care PI 0,20
0,39
0,42
0,48
0,44
0,03
0,36
0,40
0,11
Industrials PI 0,08
0,37
0,42
0,12
0,23
0,08
0,50
0,10
0,00
Consumer Services PI
0,24
0,46
0,41
0,43
0,31
0,31
0,28
0,30
0,06
Consumer Goods PI 0,47
0,44
0,14
0,43
0,49
0,24
0,47
0,43
0,03
Utilities PI 0,34
0,41
0,22
0,33
0,49
0,34
0,43
0,36
0,38
Basic Materials PI 0,29
0,03
0,07
0,34
0,20
0,35
0,25
0,08
0,32
Technology PI 0,48
0,38
0,50
0,26
0,48
0,44
0,23
0,21
0,07
Telecommunications PI
0,19
0,39
0,10
0,31
0,41
0,36
0,07
0,23
0,20
XACT OMXS30 0,08
0,11
0,00
0,09
0,02
0,39
0,34
0,13
0,21
Table 4.2.1 P-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,10
0,07
0,40
0,17
0,25
0,47
0,37
0,34
0,39
Health Care PI 0,09
0,22
0,26
0,25
0,26
0,38
0,50
0,15
0,10
Industrials PI 0,02
0,24
0,24
0,36
0,27
0,15
0,30
0,13
0,00
25
Consumer Services PI
0,37
0,23
0,16
0,44
0,47
0,41
0,39
0,18
0,20
Consumer Goods PI 0,14
0,24
0,45
0,40
0,43
0,06
0,27
0,03
0,06
Utilities PI 0,23
0,27
0,28
0,45
0,43
0,16
0,32
0,26
0,47
Basic Materials PI 0,43
0,44
0,17
0,48
0,10
0,17
0,27
0,38
0,21
Technology PI 0,37
0,49
0,29
0,40
0,29
0,32
0,22
0,07
0,04
Telecommunications PI
0,31
0,15
0,09
0,25
0,03
0,25
0,39
0,08
0,23
XACT OMXS30 0,37
0,09
0,00
0,20
0,05
0,47
0,23
0,06
0,19
Table 4.2.2 P-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,10
0,03
0,18
0,35
0,21
0,19
0,11
0,38
0,31
Health Care PI 0,06
0,21
0,34
0,19
0,43
0,12
0,12
0,09
0,03
Industrials PI 0,03
0,25
0,42
0,14
0,47
0,04
0,45
0,11
0,02
Consumer Services PI
0,17
0,31
0,45
0,47
0,32
0,18
0,27
0,15
0,06
Consumer Goods PI 0,38
0,14
0,06
0,47
0,30
0,41
0,12
0,27
0,15
Utilities PI 0,21
0,41
0,47
0,33
0,31
0,14
0,37
0,18
0,16
Basic Materials PI 0,22
0,11
0,06
0,22
0,41
0,16
0,05
0,28
0,43
Technology PI 0,10
0,11
0,44
0,28
0,13
0,37
0,05
0,05
0,13
Telecommunications PI
0,38
0,06
0,09
0,20
0,23
0,15
0,25
0,05
0,29
XACT OMXS30 0,31
0,04
0,03
0,10
0,19
0,16
0,39
0,14
0,33
Table 4.2.3 P-values from Student’s t-test for ( ) with
26
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,16
0,24
0,42
0,38
0,19
0,30
0,28
0,20
0,41
Health Care PI 0,20
0,43
0,39
0,47
0,30
0,37
0,40
0,43
0,14
Industrials PI 0,18
0,43
0,31
0,31
0,46
0,10
0,49
0,10
0,01
Consumer Services PI
0,37
0,34
0,27
0,46
0,49
0,44
0,36
0,23
0,15
Consumer Goods PI 0,41
0,50
0,26
0,47
0,48
0,41
0,36
0,12
0,03
Utilities PI 0,23
0,32
0,48
0,50
0,48
0,23
0,42
0,16
0,33
Basic Materials PI 0,31
0,39
0,14
0,37
0,48
0,23
0,24
0,44
0,32
Technology PI 0,28
0,43
0,50
0,36
0,38
0,43
0,24
0,03
0,21
Telecommunications PI
0,23
0,42
0,10
0,22
0,11
0,14
0,43
0,03
0,26
XACT OMXS30 0,34
0,12
0,01
0,17
0,02
0,33
0,30
0,22
0,25
Table 4.2.4 P-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,21
0,08
0,15
0,35
0,27
0,32
0,26
0,35
0,43
Health Care PI 0,19
0,42
0,42
0,37
0,45
0,30
0,42
0,22
0,30
Industrials PI 0,08
0,44
0,47
0,26
0,47
0,27
0,38
0,06
0,01
Consumer Services PI
0,12
0,48
0,45
0,26
0,35
0,39
0,40
0,08
0,02
Consumer Goods PI 0,38
0,39
0,20
0,26
0,38
0,41
0,16
0,26
0,25
Utilities PI 0,28
0,45
0,45
0,37
0,39
0,16
0,33
0,32
0,27
Basic Materials PI
27
0,30 0,33 0,29 0,47 0,41 0,15 0,18 0,33 0,25
Technology PI 0,25
0,44
0,41
0,40
0,16
0,29
0,16
0,15
0,33
Telecommunications PI
0,33
0,19
0,03
0,11
0,21
0,33
0,29
0,16
0,20
XACT OMXS30 0,47
0,27
0,02
0,06
0,30
0,24
0,19
0,33
0,30
Table 4.2.5 P-values from Student’s t-test for ( ) with
The same test is then performed on all values .
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,20
0,08
0,24
0,48
0,34
0,32
0,48
0,19
0,04
Health Care PI 0,20
0,39
0,41
0,48
0,44
0,03
0,35
0,40
0,12
Industrials PI 0,09
0,38
0,41
0,11
0,21
0,08
0,50
0,10
0,00
Consumer Services PI
0,26
0,46
0,42
0,43
0,31
0,32
0,28
0,30
0,07
Consumer Goods PI 0,47
0,43
0,14
0,43
0,49
0,25
0,46
0,44
0,02
Utilities PI 0,35
0,41
0,24
0,33
0,49
0,35
0,43
0,37
0,38
Basic Materials PI 0,29
0,03
0,07
0,33
0,18
0,34
0,24
0,07
0,32
Technology PI 0,48
0,39
0,50
0,25
0,48
0,44
0,24
0,23
0,09
Telecommunications PI
0,20
0,39
0,09
0,30
0,40
0,35
0,08
0,21
0,20
XACT OMXS30 0,08
0,10
0,00
0,08
0,02
0,39
0,34
0,10
0,21
Table 4.2.6 P-values from Student’s t-test for ( ) with
28
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,09
0,05
0,39
0,15
0,24
0,47
0,37
0,33
0,38
Health Care PI 0,10
0,22
0,25
0,26
0,27
0,39
0,50
0,15
0,10
Industrials PI 0,02
0,23
0,20
0,35
0,27
0,15
0,30
0,12
0,00
Consumer Services PI
0,38
0,25
0,19
0,44
0,47
0,41
0,39
0,19
0,24
Consumer Goods PI 0,16
0,23
0,45
0,39
0,43
0,09
0,28
0,03
0,08
Utilities PI 0,24
0,27
0,28
0,45
0,43
0,18
0,32
0,26
0,47
Basic Materials PI 0,42
0,43
0,16
0,48
0,07
0,15
0,25
0,36
0,21
Technology PI 0,38
0,49
0,29
0,39
0,27
0,32
0,24
0,08
0,05
Telecommunications PI
0,31
0,15
0,09
0,24
0,03
0,25
0,41
0,07
0,24
XACT OMXS30 0,38
0,09
0,00
0,16
0,04
0,47
0,22
0,05
0,18
Table 4.2.7 P-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,09
0,02
0,17
0,35
0,22
0,20
0,10
0,37
0,31
Health Care PI 0,07
0,21
0,33
0,18
0,43
0,11
0,11
0,07
0,03
Industrials PI 0,03
0,25
0,41
0,10
0,47
0,05
0,45
0,09
0,02
Consumer Services PI
0,19
0,32
0,46
0,47
0,33
0,21
0,29
0,17
0,09
Consumer Goods PI 0,38
0,15
0,10
0,46
0,30
0,42
0,12
0,25
0,15
Utilities PI 0,20
0,40
0,47
0,33
0,30
0,15
0,37
0,15
0,16
Basic Materials PI
29
0,20 0,12 0,05 0,19 0,40 0,14 0,04 0,25 0,42
Technology PI 0,11
0,12
0,44
0,26
0,13
0,38
0,06
0,05
0,13
Telecommunications PI
0,39
0,08
0,11
0,18
0,24
0,18
0,27
0,05
0,30
XACT OMXS30 0,31
0,03
0,03
0,07
0,18
0,16
0,40
0,13
0,32
Table 4.2.8 P-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,18
0,23
0,42
0,38
0,19
0,32
0,28
0,17
0,41
Health Care PI 0,18
0,43
0,38
0,47
0,32
0,37
0,39
0,43
0,14
Industrials PI 0,19
0,43
0,29
0,29
0,46
0,11
0,49
0,08
0,00
Consumer Services PI
0,38
0,35
0,29
0,46
0,49
0,44
0,37
0,24
0,18
Consumer Goods PI 0,41
0,50
0,27
0,47
0,48
0,42
0,35
0,11
0,03
Utilities PI 0,22
0,30
0,48
0,50
0,48
0,22
0,42
0,15
0,33
Basic Materials PI 0,30
0,39
0,13
0,36
0,48
0,24
0,22
0,43
0,30
Technology PI 0,28
0,44
0,50
0,35
0,38
0,43
0,25
0,03
0,23
Telecommunications PI
0,25
0,42
0,11
0,21
0,11
0,16
0,44
0,03
0,28
XACT OMXS30 0,34
0,11
0,01
0,14
0,02
0,33
0,32
0,20
0,22
Table 4.2.9 P-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,21
0,08
0,13
0,35
0,28
0,34
0,25
0,34
0,43
30
Health Care PI 0,19
0,43
0,41
0,38
0,45
0,30
0,42
0,21
0,31
Industrials PI 0,08
0,44
0,47
0,24
0,47
0,28
0,38
0,04
0,01
Consumer Services PI
0,14
0,48
0,45
0,28
0,36
0,40
0,40
0,08
0,04
Consumer Goods PI 0,38
0,39
0,21
0,24
0,39
0,42
0,15
0,23
0,27
Utilities PI 0,28
0,45
0,45
0,36
0,38
0,15
0,33
0,32
0,26
Basic Materials PI 0,28
0,32
0,28
0,47
0,40
0,16
0,16
0,31
0,23
Technology PI 0,26
0,44
0,41
0,40
0,17
0,29
0,18
0,15
0,34
Telecommunications PI
0,35
0,21
0,05
0,10
0,23
0,34
0,32
0,15
0,21
XACT OMXS30 0,47
0,27
0,02
0,03
0,27
0,26
0,21
0,32
0,29
Table 4.2.10 P-values from Student’s t-test for ( ) with
In order to better determine which sector spreads are useful as a predictive factor of
OMXS30, the following table contains the average p-value of all five tested values for . It is
reasonable to assume that if the factor really does contain useful information, its
performance would not be limited to one setting alone. A factor that is highly dependent on
any specific parameter is probably not stable when going out-of-sample since the best
parameter can change very rapidly in the market.
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,15
0,10
0,28
0,35
0,25
0,32
0,30
0,29
0,32
Health Care PI 0,15
0,34
0,36
0,35
0,38
0,24
0,36
0,26
0,14
Industrials PI 0,08
0,35
0,37
0,24
0,38
0,13
0,42
0,10
0,01
Consumer Services PI
0,25
0,37
0,35
0,41
0,39
0,35
0,34
0,19
0,10
Consumer Goods PI 0,36
0,34
0,22
0,41
0,42
0,31
0,28
0,22
0,10
31
Utilities PI 0,26
0,37
0,38
0,40
0,42
0,21
0,37
0,26
0,32
Basic Materials PI 0,31
0,26
0,15
0,38
0,32
0,21
0,20
0,30
0,30
Technology PI 0,30
0,37
0,42
0,34
0,29
0,37
0,18
0,10
0,16
Telecommunications PI
0,29
0,24
0,08
0,22
0,20
0,25
0,29
0,11
0,24
XACT OMXS30 0,31
0,13
0,01
0,12
0,12
0,32
0,29
0,17
0,26
Table 4.2.11 Average p-values from Student’s t-test for ( ) with
Sector s1-s2
Finan
cials PI
Health
Care P
I
Ind
ustrials PI
Co
nsu
mer
Services PI
Co
nsu
mer
Go
od
s PI
Utilities P
I
Basic M
aterials
PI
Techn
olo
gy PI
Telecom
mu
nicati
on
s PI
XA
CT O
MX
S30
Financials PI
0,15
0,09
0,27
0,34
0,25
0,33
0,29
0,28
0,31
Health Care PI 0,15
0,34
0,36
0,36
0,38
0,24
0,36
0,25
0,14
Industrials PI 0,08
0,35
0,36
0,22
0,38
0,13
0,42
0,09
0,01
Consumer Services PI
0,27
0,37
0,36
0,42
0,39
0,36
0,35
0,20
0,12
Consumer Goods PI 0,36
0,34
0,23
0,40
0,42
0,32
0,28
0,21
0,11
Utilities PI 0,26
0,37
0,38
0,39
0,42
0,21
0,37
0,25
0,32
Basic Materials PI 0,30
0,26
0,14
0,36
0,31
0,21
0,18
0,29
0,30
Technology PI 0,30
0,37
0,43
0,33
0,29
0,37
0,19
0,11
0,17
Telecommunications PI
0,30
0,25
0,09
0,21
0,20
0,25
0,30
0,10
0,25
XACT OMXS30 0,31
0,12
0,01
0,10
0,11
0,32
0,30
0,16
0,25
Table 4.2.12 Average p-values from Student’s t-test for ( ) with
Based on the observations in the tables, there is only one sector spread that passes a
threshold of 0.05; Industrials PI and XACT OMXS30. Overall the industrial companies seem to
be strong drivers of OMXS30, exhibiting the mean reversion tendency that also drive
32
OMXS30. To illustrate this I drew the equity chart of a non-compounding portfolio trading
the spread on detrended data.
Figure 4.2.1 Non-compounding portfolios trading when
( ) and respectively with and
linear regression.
Figure 4.2.2 Non-compounding portfolios trading when
( ) and respectively with and
linear regressions.
33
Figure 4.2.3 Non-compounding portfolios trading when
( ) and respectively with and
linear regressions.
Figure 4.2.4 Non-compounding portfolios trading when
( ) and respectively with and
linear regressions.
Figure 4.2.5 Non-compounding portfolios trading when
( ) and respectively with and
linear regressions.
34
5. Combining the Predictive Factors into a Learning Ensemble System Based on the research in the previous chapters, the following factors were robust enough to
warrant further analysis.
Price Change of XACT OMXS30
Trading Day of the Month
Price Change of Telecommunications PI
Price Change of Industrials PI vs. Price Change of XACT OMXS30
These four factors all prove to be predictive of the price change the next day for XACT
OMXS30 below a 0.05 significance level. In order to incorporate them into a final trading
model ready for use, three different techniques will be used.
5.1 Ensemble Trading Model
When building the final trading model, three methods will be used that are described here.
5.1.1 Artificial Neural Networks
Inspired by the way biological neural networks in the brains of animals work, Artificial Neural
Networks are mathematical models used to process information by feeding it into an
interconnected group of artificial neurons (Wikipedia, 2012). The interconnected nature of
neural networks is what accounts for the complex behavior displayed by animals. The main
strength of a neural network is their ability to detect hidden non-linear patterns between
the inputs and the outputs. This has made them popular for applications such as e-mail spam
filtering, handwritten text recognition and time series prediction.
35
Figure 5.1.1 Illustration of a neural network with four layers, one input layer with eight
neurons, two hidden layers with eleven and five neurons respectively and one output layer
with one output neuron.4
One of the main disadvantages of artificial neural networks is their black-box nature. Since
the number of connections increase with the number of neurons added, no human will be
able to figure out exactly how even a small neural network makes its decisions. One of the
implications of this is that the risk of over fitting the model to the data increases. This makes
them inefficient to apply when data is scarce.
The usual network consists of neurons with connections, each one with a weight. The
neuron summarizes inputs from the connections multiplied by their weights .
∑
The signal is then transferred by the neuron’s activation function. This function could have
many forms, for example it could be a threshold function.
{
A binary number is sent to the following connections depending on whether the summarized
value exceeds the threshold value for that specific neuron. By adjusting the weights and
the threshold values the network can “learn” patterns and produce the sought output.
5.1.2 Genetic Algorithms
Inspired by the Darwinian idea of natural selection Genetic Algorithms, a subset of
Evolutionary Algorithms, is an optimization technique designed to evolve the best solution
by features such as recombination and mutations (Brownlee, 2011).
The general outlining of most genetic algorithms could be expressed in the following
procedure:
1. Initialize a population of random solutions.
2. Evaluate the fitness of each solution in the population.
3. Repeat the following steps until some objective is met.
1. Select the best solutions in the population for reproduction.
2. Create a new population of new solutions by recombining the best
from the former and apply occasional mutations to their chromosomes.
4 Picture retrieved from http://www.optimaltrader.net/neural_network.htm 2012-03-31.
36
3. Evaluate the fitness of each new solution.
As the populations are derived from the fittest from the last population, the average fitness
will increase as the algorithm runs. With time this increase in fitness will fade as the
populations converges on a maximum that is hopefully the global maximum of the
parameter space5.
Figure 5.1.2 General outlining of an evolutionary algorithm.
5.1.3 Ensemble learning
In order to achieve better prediction or classification through a learning model, multiple
models can be combined. This process is called ensemble learning and can be of great use
when dealing with models that are trained on sample with such a size that generalization
becomes a problem (Robi Polikar, 2011). Small sample might not fully represent the data
distribution and therefore the model might learn patterns that will not persist on a
validation set. Another use of ensemble modeling could be on non-stationary data, such as
price history. It is likely that the point that represents the global maximum derived from an
optimization of the parameters will not stay at the peak in the parameter space going
forward since the most profitable strategy will change with shifting conditions. By averaging
5 A vector with N parameters can be through of as a specific point in an N-dimensional space. This would be the
parameter-space.
37
the output from the points around the area of the global maximum, and maybe for the
points around the local maxima as well, superior robustness can be achieved.
Figure 5.1.3 Example of a three-dimensional parameter space, based on the total return for
a strategy trading long/short OMXS30 when a crossover occurs between two M- and N-day
moving averages of the closing price.
5.2 Method
The following trading system presented will be based on the aggregated output of several
hundred artificial neural networks trained by a genetic algorithm. There are four inputs to
the networks, the percentile-ranked values for the N-day price return of XACT OMXS30, the
percentile-ranked N-day return for the Telecommunications sector index and the percentile-
ranked N-day relative return of the Industrial sector index and XACT OMXS30 as well as the
value of 1 whenever today is the first or third last trading day of the month (else 0). Since all
inputs will be within the range of 0 and 1, the initial weights and thresholds will be set at any
value between -1 and 1. This allows the network to have maximum flexibility in regards to its
inputs.
The four input neurons are connected to four neurons in the hidden layer. The number of
neurons in the hidden layer were chosen based on the number of input neurons. More
38
neurons might give greater complexity, but with the tradeoff that the networks might fail to
properly generalize. Fewer neurons might not catch all of the patterns that exist. The general
architecture of the neural networks is illustrated in the following figure.
Figure 5.2.1 Neural network topology
The activation function used for the neurons in the hidden layer and in the output layer is
the threshold function described earlier. The values that are to be optimized can be
expressed as a vector. * + where is the number of neurons. The total
number of values in the vector will then be . In order to optimize
this I use a steady-state genetic algorithm with tournament selection and a population size
of 100 individuals. An output of 1 from the output neuron will be seen as a buy signal for the
portfolio and 0 will be seen as a sell signal. The fitness function that will be used is the right
side probability that the null hypothesis is false from a Student’s t-test (i.e. that the network
produces profitable returns). In order to avoid the problem that the sample size might be
too small for proper evaluation of the significance, a rule that says that the fitness is set to 1
(i.e. zero probability that the return is positive due to other factors that random chance) if
the number of days that the network fired was smaller than 100. This will eliminate all
candidates that might just have high significance due to over fitting a small number of
occurrences.
39
A steady-state genetic algorithm differs from a generational one in the respect that the
population is continually updated, instead of generating a new population (generation) each
iteration. The tournament selection method works by first selecting at random two
individuals from the population and . If the fitness value of is less than that of ,
( ) ( ), then is selected as one of the parents for the new individual
and is selected as the individual about to be replaced. This process is then repeated to
select the other parent. The offspring is created by recombining the network weights and
thresholds vector of each parent at random, so for example the weight in the offspring
has a fifty percent chance of being from one of the parents. This is called Uniform Crossover
and should approximately give the offspring half the genes from one parent and half the
genes from the other. In order to promote diversity and continual evolution there is a small
probability that a mutation occurs in the genes of the new individual. With a probability of
0.5% any random value between -1 and 1 is added to the value in the gene, and by that the
population extends its search for the global maximum. 0.5% was selected so that rapid
mutations would not deteriorate the average fitness of the population6. The genetic
algorithm is initiated with individuals with random values for their genes and terminated
when 200,000 iterations have been made or when 15,000 iterations have been made
without an improvement in the best individual’s fitness (indicating that the algorithm has
reached a minimum). The final chromosome (or network setting) is the best individual in the
final population i.e. the one with the lowest fitness and hence the lowest p-value.
To test the network’s ability to properly learn patterns and generalize on unseen data, the
data set will be divided into one in-sample period, where the fitness for the genetic
algorithm is calculated and the networks are optimized, and one out-of-sample period
where the network is tested. In order to improve the network’s ability to generalize the
output of 2,500 independently optimized neural networks, 500 for each N-day period setting
for the inputs, will be aggregated into a continuous signal ranging from 0 to 2500.
5.3 Results
To evaluate the results of the strategy, two performance metrics will be used. To first one is
the Compound Annual Growth Rate, or CAGR. CAGR is defined as:
( ) ( ( )
( ))
where ( ) is the start value for the portfolio, ( ) is the last value and is the
number of days. The numerator in the exponent indicates that the CAGR for daily returns are
scaled to yearly, with 252 being the number of trading days in a year.
6 In nature, massive mutations caused by e.g. radioactive fallout are never beneficial as many mutations simultaneously are extremely unlikely to be anything other than negative for the carrier. Small gradual steps however, are what support long term evolution of species.
40
The second one is the Sharpe Ratio which is a metric used to measure the return in relation
to the risk, defined as the standard deviation of returns. It is named after the Nobel Prize
winner William Forsyth Sharpe. Its definition is:
( ) ( )
√
where √ is the standard deviation of daily returns annualized. Usually the risk free
rate of return is also factored into the formula so that the nominator is the excess return,
but in this paper the risk free return will be equal to zero.
5.3.1 In-sample performance
2,500 artificial neural networks were optimized for XACT OMXS30 during the period from
the 21st of January 2004 to the 4th of January 2010. The aggregated signal ranges from 0 to
2,500 with a higher value indicating that a greater number of networks are firing a buy
signal. By having the exposure to the market scale with the signal’s strength,
∑
where is the output of the network, all information contained in the signal can be used.
For example, a signal of 1,250 means that 50% of the portfolio will be invested and the rest
stays in cash. The following chart illustrates the performance using this approach.
41
Figure 5.3.1 Portfolios with scaling position-size based on output of the neural network
ensemble.
Due to portfolio constraints, this approach might not be possible for some traders. Splitting
the capital into 2,500 equal parts without getting too much slippage from the commission
paid for investing only small parts of the portfolio at once will require a lot of capital. A more
realistic way of trading the aggregated signal from the ensemble would then be by waiting
for a higher value to initiate a trade, for example 1,250 which would be the majority of the
networks firing at the same day.
42
Figure 5.3.2 Portfolios trading when the output of the neural network ensemble indicates
that the majority of the networks are firing.
Ensemble system exposure method CAGR Sharpe Ratio Significance
Scaling Position-size 26.75% 3.47 0.0000
Majority Vote 30.91% 3.24 0.0000
Table 5.3.1 Performance metrics for the in-sample period.
The significance based on a Bootstrap test (on detrended daily returns) is extremely low,
which is what one would expect since the ensemble was evolved for its significance in the
sample. Both the Sharpe Ratio and the CAGR are very high, with higher Sharpe but lower
CAGR when scaling into position as opposed to being either all in or out. This would be
because adjusting the position-size to the signal’s strength smooths the daily return and
hence the standard deviation is lower.
43
5.3.2 Out-of-sample performance
When dealing with optimization one always has to consider that the results are designed and
that the output of the optimized solution was changed in hindsight. In order to test the
validity of the solution it has to be cross validated on an independent data set. This was done
by leaving out 537 of the days in the sample period from the fitness calculation when
running the genetic algorithm to train the neural networks. On the period from 5th of
January 2010 to 17th of February 2012 the ensemble is tested again to analyze its ability to
generalize the patterns that it has learnt during the in-sample period. Other than the factors
selected as input to the networks, no part of the ensemble trading system was changed as a
result of the data in-sample. Below is the performance of the two ways the aggregated signal
might be traded.
Figure 5.3.3 Portfolios with scaling position-size based on output of the neural network
ensemble.
44
Figure 5.3.2 Portfolios trading when the output of the neural network ensemble indicate
that the majority of the networks are firing.
Ensemble system exposure method CAGR Sharpe Ratio Significance
Scaling Position-size 14.83% 2.15 0.0058
Majority Vote 17.37% 1.88 0.0077
Table 5.3.1 Performance metrics for the out-of-sample period.
The significance from the Bootstrap test (with detrended daily returns) is very low for the
portfolio with a scaling position-size as well as for the portfolio that trades based on what
the majority of the neural networks say. The significance was lower for the ensemble system
than for every factor tested which indicates that its ability to learn and use the inputs in an
innovative way is superior to simple rule combinations.
45
6. Conclusion This paper started out by introducing the topic speculation and the difference between
objective and subjective decision. It then showed the framework for testing and verifying
objective trading decisions on which all following chapters were based. The results of the
factor selection process using the non-parametric hypothesis test called Bootstrap
resampling as well as Student’s t-test, were that four factors in particular displayed a
tendency to predict the future returns of the Swedish stock market with a low significance
that results were due to random chance alone. The four factors were then used as inputs
into artificial neural networks that were trained during an in-sample period from 2004 to
2010 using a genetic algorithm with steady-state tournament selection. 2,500 of these
networks were trained and the aggregated output show great results during both the in-
sample period and the out-of -sample period 2010 to 2012. The significance during the out-
of-sample period calculated by bootstrap resampling was low enough to make me confident
that the results were not due to chance. By investing based on what the majority of the
networks in the ensemble said, an annual return of 17% were generated during the period
unused by the training method and with a Sharpe ratio of 1.88, indicating that the returns
obtained were high, relative to the risk taken.
One thing that was not looked at in this paper is the performance and factor selection that
could be examined from other liquid securities. For example future papers could try to
examine what effect the factor tested in this paper has on stocks within the OMXS30 and
maybe do the same for other stock indices around the world. Also, the inputs to the neural
network could be complemented with the current day volume and volatility of the security
that is being optimized, so that more information contained in the price could be used to
make the prediction. The optimization method could also take into account more realistic
effects of slippage, so that the ensemble can evolve the optimal solution for the given
commission structure. This could be done by changing the type of network trained from a
feedforward on to a recurrent one so that the network also takes its current position as an
input.
In conclusion, there is a possibility to achieve higher risk adjusted returns by objectively
trading based on historical tests. However, results can only be verified after the trading has
been done in real-time with real money.
46
7. References
Wikipedia. (2011, October 19). Retrieved November 13, 2011, from Wikipedia:
http://en.wikipedia.org/wiki/Backtesting
Aronson, D. R. (2006). Hypothesis Tests and Confidence Intervals. In D. R. Aronson, Evidence-
Based Technical Analysis: Applying the Scientific Method and Statistical Inference to
Trading Signals (pp. 217-255). Chichester: John Wiley & Sons Ltd.
Bukey, D. (2007, February 1). David Aronson: STRUCK BY SCIENCE, PDF. Retrieved December
31, 2011, from InvivoAnalytics.com: http://invivoanalytics.com/wp-
content/uploads/2008/03/ARONSON_200702.pdf
Fonder, X. (n.d.). Fonder. Retrieved February 19, 2012, from XACT:
http://www.xact.se/fonder/bred-aktiemarknad/xact-omxs30/
Hassler, F. (2011, November 23). The impact of survivorship bias. Retrieved February 19,
2012, from Engineering Returns: http://engineering-returns.com/2011/11/23/the-
impact-of-survivorship-bias/
Kaplan, I. (2004, October 1). Book reviews. Retrieved January 1, 2012, from Bear Products
International Home Page:
http://www.bearcave.com/bookrev/misbehavior_of_markets.html
McConnell, X. (2006, August 20). Equity Returns at the Turn of the Month. Retrieved Mars
31, 2012, from Social Science Research Network:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=925589
PhD, J. B. (2011, January 1). Nature-Inspired Programming Recipes. Retrieved Mars 31, 2012,
from Clever Algorithms: http://www.cleveralgorithms.com/nature-
inspired/index.html
Robi Polikar, R. U. (2011, October 21). Ensemble learning. Retrieved Mars 30, 2012, from
Scholarpedia : http://www.scholarpedia.org/article/Ensemble_learning
Stokes, M. (2009, February 12). Short-term Mean-Reversion Becoming Stronger: Part II (The
Why). Retrieved February 22, 2012, from MarketSci Blog:
http://marketsci.wordpress.com/2009/02/12/short-term-mean-reversion-becoming-
stronger-part-ii-the-why/
Thornton, S. (1997, November 13). Karl Popper. Retrieved December 31, 2011, from
Standford Encyclopedia Of Philosophy:
http://plato.stanford.edu/entries/popper/#SciKnoHisPre
47
Wikipedia. (2012, Mars 20). Artificial neural network. Retrieved Mars 26, 2012, from
Wikipedia, the free encyclopedia:
http://en.wikipedia.org/wiki/Artificial_neural_network
Xact. (2012, February 17). Utbildning - Historiska utdelningar. Retrieved February 27, 2012,
from http://www.xact.se/: http://www.xact.se/utbildning/utdelning-fran-
fonderna/historiska-utdelningar/#
48
Appendix A – Excel Testing This appendix describes the formulas used when testing the factors in the paper in Microsoft
Excel 2010.
Given that the closing price is in column D, the formula for the daily return is:
The mean daily return could be calculated in the cells in the range F510 to F2541 with the
formula for averages:
( )
The percentile rank is calculated by as the value in H258 is relationship to all the values in
the range H7:H258.
( )
The last value in the formula, 10, specifies the number of decimals.
To test the value, a logical function is used.
( )
If the value in the cell I509 exceeds 0.5, the value from G510 is return, else no value at all is
returned.
To calculate the t-significance, the number of occurrences and the sample standard
deviation has to be calculated.
( ( ) ) ( ( ) ( ( )))
The zero specifies that we are testing the hypothesis that the mean is equal to zero (since
the data is detrended).
The significance is then calculated by the built-in formula in Excel for Student’s t-test.
( ( ) ( ) )
The absolute value of the value returned from the previous formula is checked against a
table with - degrees of freedom.
49
Appendix B – C# code for implementing the neural networks using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Meta.Numerics;
using Meta.Numerics.Statistics;
using System.IO;
namespace Final_Neural_Network_for_The_Project_Nova_Paper
{
class Program
{
static void Main(string[] args)
{
int PopulationSize = 100;
int MaxNumberOfTrainingIterations = 200000;
int MaxNumberOfIterationsFromLastImprovment = 15000;
double MutationRate = 0.005d;
int ValuesPerInput = 2032;
//Counts the number of rows in the input-file
int n = 0;
using (TextReader r = File.OpenText("C:/Input to Project
Nova.txt"))
{
while (r.ReadLine() != null)
{
n++;
}
}
Console.WriteLine("There are " + n + " values in the file.");
//Puts the values from the txt-file into an array
double[] values = new double[n];
using (TextReader r = File.OpenText("C:/Input to Project
Nova.txt"))
{
string line;
for (int i = 0; i < n; i++)
{
line = r.ReadLine();
double.TryParse(line, out values[i]);
}
}
int[] OldPredictions = new int[2032];
for (int Ensemble = 0; Ensemble < 500; Ensemble++)
{
Console.Clear();
Console.WriteLine(Ensemble);
//Create the initial population and assigne them random
values
double[] DNA = new double[25 * PopulationSize];
double[] Fitness = new double[PopulationSize];
Random random = new Random();
for (int i = 0; i < (25 * PopulationSize); i++)
{
DNA[i] = (random.NextDouble() * (random.Next(0, 2) * 2
- 1));
50
}
//Evaluates the entire population by their significance
from a one-sampled t-test.
//Computes the neural network.
for (int i = 0; i < PopulationSize; i++)
{
//declares and sets an array that contains the
individual
double[] IndividualEvaluated = new double[25];
for (int d = 0; d < 25; d++)
{
IndividualEvaluated[d] = DNA[d + 25 * i];
}
//declares a sample that will contain the daily returns
that we will later calculate the significance of
Sample tsample = new Sample();
//tests the neural network settings
for (int g = 0; g < 1495; g++)
{
double[] HiddenLayer = new double[4];
double Output = 0;
int gene = 0;
//sums the weights multiplied by the inputs for
each neuron in the hidden layer
for (int k = 0; k < 4; k++)
{
for (int p = 1; p < 5; p++)
{
HiddenLayer[k] += values[g + ValuesPerInput
* p] * IndividualEvaluated[gene];
gene++;
}
}
//performs the neurons activation function, sends a
binary number if the threshold value is exceeded
for (int k = 0; k < 4; k++)
{
if (HiddenLayer[k] > IndividualEvaluated[gene])
{
HiddenLayer[k] = 1;
}
else
{
HiddenLayer[k] = 0;
}
gene++;
}
//sums the weights times the signals from the
hidden layer in the output neuron
for (int k = 0; k < 4; k++)
{
Output += HiddenLayer[k] *
IndividualEvaluated[gene];
gene++;
}
//if the ouput neuron fires, the tsample addes the
return for that day
if (Output > IndividualEvaluated[gene])
{
tsample.Add(values[g]);
}
51
}
//checks that the sample size is not to low
if (tsample.Count > 100)
{
TestResult fitness = tsample.StudentTTest(0);
//assignes the individual its significance
Fitness[i] = fitness.RightProbability;
}
else
{
Fitness[i] = 1;
}
}
//starts the breeding
int IterationsWhenLastImprovement = 0;
for (int i = 0; i < MaxNumberOfTrainingIterations && (i -
IterationsWhenLastImprovement) < MaxNumberOfIterationsFromLastImprovment;
i++)
{
int Parent1;
int Parent2;
int tIndividual1;
int tIndividual2;
int Exiled;
//selectes the random individual that has the lowest
fitness (t-significance)
//the loser in the turnament is the one being replaced
(Exiled)
tIndividual1 = random.Next(0, PopulationSize);
tIndividual2 = random.Next(0, PopulationSize);
if (Fitness[tIndividual2] < Fitness[tIndividual1])
{
Parent1 = tIndividual2;
Exiled = tIndividual1;
}
else
{
Parent1 = tIndividual1;
Exiled = tIndividual2;
}
//selectes the other parent
tIndividual1 = random.Next(0, PopulationSize);
tIndividual2 = random.Next(0, PopulationSize);
if (Fitness[tIndividual2] < Fitness[tIndividual1])
{
Parent2 = tIndividual2;
}
else
{
Parent2 = tIndividual1;
}
//declares and sets an array that contains the
individual by recombining the parents and performes random mutations to its
chromosomes
double[] IndividualEvaluated = new double[25];
for (int d = 0; d < 25; d++)
52
{
if (random.Next(0, 2) == 0)
{
IndividualEvaluated[d] = DNA[d + 25 * Parent1];
}
else
{
IndividualEvaluated[d] = DNA[d + 25 * Parent2];
}
if (random.NextDouble() < MutationRate)
{
IndividualEvaluated[d] += (random.NextDouble()
* (random.Next(0, 2) * 2 - 1));
}
}
//moves on and tests the new network
Sample tsample = new Sample();
for (int g = 0; g < 1495; g++)
{
double[] HiddenLayer = new double[4];
double Output = 0;
int gene = 0;
//sums the weights multiplied by the inputs for
each neuron in the hidden layer
for (int k = 0; k < 4; k++)
{
for (int p = 1; p < 5; p++)
{
HiddenLayer[k] += values[g + ValuesPerInput
* p] * IndividualEvaluated[gene];
gene++;
}
}
//performs the neurons activation function, sends a
binary number if the threshold value is exceeded
for (int k = 0; k < 4; k++)
{
if (HiddenLayer[k] > IndividualEvaluated[gene])
{
HiddenLayer[k] = 1;
}
else
{
HiddenLayer[k] = 0;
}
gene++;
}
//sums the weights times the signals from the
hidden layer in the output neuron
for (int k = 0; k < 4; k++)
{
Output += HiddenLayer[k] *
IndividualEvaluated[gene];
gene++;
}
//if the ouput neuron fires, the tsample addes the
return for that day
if (Output > IndividualEvaluated[gene])
53
{
tsample.Add(values[g]);
}
}
//find the best in the population
int best = 0;
for (int j = 0; j < PopulationSize; j++)
{
if (Fitness[j] < Fitness[best])
{
best = j;
}
}
//checks that the sample size is not to low
if (tsample.Count > 100)
{
TestResult fitness = tsample.StudentTTest(0);
//assignes the individual (the one selected to
leave) its significance
Fitness[Exiled] = fitness.RightProbability;
}
else
{
Fitness[Exiled] = 1;
}
//check if the new individual is better then the last
best individual, if then an improvement was made
if (Fitness[Exiled] < Fitness[best])
{
IterationsWhenLastImprovement = i;
}
//the new individual replaces the less fitt
for (int d = 0; d < 25; d++)
{
DNA[d + 25 * Exiled] = IndividualEvaluated[d];
}
}
//find the best in the final population
int fbest = 0;
for (int i = 0; i < PopulationSize; i++)
{
if (Fitness[i] < Fitness[fbest])
{
fbest = i;
}
}
//test the best and save its predictions for each day in a
txtfile
double[] BestIndividual = new double[25];
for (int d = 0; d < 25; d++)
{
BestIndividual[d] = DNA[d + 25 * fbest];
}
//this array will hold the predictions made by this
particular network
int[] predictions = new int[2032];
//tests the individual on the entire sample
54
for (int g = 0; g < 2032; g++)
{
double[] HiddenLayer = new double[4];
double Output = 0;
int gene = 0;
//sums the weights multiplied by the inputs for each
neuron in the hidden layer
for (int k = 0; k < 4; k++)
{
for (int p = 1; p < 5; p++)
{
HiddenLayer[k] += values[g + ValuesPerInput *
p] * BestIndividual[gene];
gene++;
}
}
//performs the neurons activation function, sends a
binary number if the threshold value is exceeded
for (int k = 0; k < 4; k++)
{
if (HiddenLayer[k] > BestIndividual[gene])
{
HiddenLayer[k] = 1;
}
else
{
HiddenLayer[k] = 0;
}
gene++;
}
//sums the weights times the signals from the hidden
layer in the output neuron
for (int k = 0; k < 4; k++)
{
Output += HiddenLayer[k] * BestIndividual[gene];
gene++;
}
//if the ouput neuron fires, the predictions-array will
hold the output for later usage
if (Output > BestIndividual[gene])
{
predictions[g] = 1;
}
else
{
predictions[g] = 0;
}
}
//combine the array that holds all predictions from all
previously evolved networks with the predictions from the last one
for (int k = 0; k < 2032; k++)
{
OldPredictions[k] = predictions[k] + OldPredictions[k];
}
//creates the file that will contain the final prediction
output
55
using (FileStream stream = new FileStream(@"C:/Prediction -
Project Nova.txt", FileMode.Create))
using (TextWriter writer = new StreamWriter(stream))
{
writer.WriteLine("");
}
//puts the preditions in the txtfile
for (int t = 0; t < 2032; t++)
{
File.AppendAllText("C:/Prediction - Project Nova.txt",
OldPredictions[t] + Environment.NewLine);
}
//continues the loop until the ensemble is clear
}
Console.ReadKey();
}
}
}
56
Appendix C – Bootstrap code using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Timers;
using System.Text;
namespace Bootstrap
{
class Program
{
static void Main()
{
int n=0;
using (TextReader r = File.OpenText("C:/bst.txt"))
{
string line;
while ((line = r.ReadLine()) != null)
{
n++;
}
Console.WriteLine("Det finns "+n+" värden i filen");
}
double[] tabl = new double[n];
Random place = new Random();
double mean=0;
int maxits=500000;
DateTime tid;
using (TextReader r = File.OpenText("C:/ bst.txt"))
{
string line;
int x=-1;
while ((line = r.ReadLine()) != null)
{
x = x + 1;
//Console.WriteLine(line);
double.TryParse(line, out tabl[x]);
}
}
//calculate mean
for(int i=0;i<n;i++)
{
mean += tabl[i];
}
mean = (mean / n);
Console.WriteLine("Medelvärdet som kommer att testas är
"+mean);
//substract mean from array values
for (int i = 0; i < n; i++)
{
tabl[i] -= mean;
}
int pvalue=0;
tid=DateTime.Now;
for(int its=0;its<maxits;its++)
{
double sum=0;
for(int i=0;i<n;i++)
{
57
int r = place.Next(0, n);
sum += tabl[r];
//Console.WriteLine(r);
}
if((sum/n) >= mean)
{
pvalue++;
}
}
TimeSpan elsp = DateTime.Now - tid;
Console.WriteLine(elsp);
Console.WriteLine("P-värdet beräknas till " +
((decimal)pvalue/(decimal)maxits));
Console.WriteLine(1-((decimal)pvalue / (decimal)maxits));
Console.ReadKey();
}
}
}
top related