a trading system based on ann

Master in Computer Engineering

Computing Project(Proj-H-402)

A trading system based ontechnical indicators and neural

networks

Student:Michel Halmes

Supervisors:Mauro Birattari

Michele Pace

May 18, 2012

Contents

1 Introduction 21.1 What is Technical Analysis? . . . . . . . . . . . . . . . . . . . . . 21.2 How does it work? . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Technical indicators 52.1 Average Distribution Line (ADL) . . . . . . . . . . . . . . . . . . 62.2 Average Distribution Index (ADX) . . . . . . . . . . . . . . . . . 72.3 Bollinger band (%B) . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Relative Strength Index . . . . . . . . . . . . . . . . . . . . . . . 82.5 Money Flow Index (MFI) . . . . . . . . . . . . . . . . . . . . . . 92.6 Chaiki Money Flow (CMF) . . . . . . . . . . . . . . . . . . . . . 92.7 Force Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 Moving Average Convergence-Divergence (MACD) . . . . . . . . 102.9 Stochastic Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Implementation and evaluation of basic trading systems 113.1 Stop-loss and take-profit . . . . . . . . . . . . . . . . . . . . . . . 113.2 Money management . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Test candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.5 Results of testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 A simple approach based on Neural Networks 194.1 A brief explanation of Neural Networks . . . . . . . . . . . . . . 194.2 How to train a Neural Network? . . . . . . . . . . . . . . . . . . 204.3 The training system . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Results of the neural network system . . . . . . . . . . . . . . . . 22

5 Neural Network training based on genetic algorithms 245.1 Overview of the approach . . . . . . . . . . . . . . . . . . . . . . 245.2 The genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . 245.3 The objective function . . . . . . . . . . . . . . . . . . . . . . . . 275.4 An additional restart feature . . . . . . . . . . . . . . . . . . . . 30

6 Performance tests of the Neural Network 326.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.2 Robustness of the trading system . . . . . . . . . . . . . . . . . . 336.3 Potential remedies against the low robustness . . . . . . . . . . . 37

6.3.1 Grouped training . . . . . . . . . . . . . . . . . . . . . . . 376.3.2 Control via the objective function . . . . . . . . . . . . . 396.3.3 Complexity of the network . . . . . . . . . . . . . . . . . 40

7 Conclusion 417.1 Why technical indicators and neural networks do not work together 417.2 A personal comment on efficient markets . . . . . . . . . . . . . . 42

1

The aim of this project is purely academic. It asses in how far technicalindicator can be used in combination with neural networks in order to setup

a trading system. It will be found that such an approach is not straightforwardand very unlikely to be useful as the basis for a fully working trading system.Yet, this project has its contribution in the presentation of a well performinggenetic algorithm which can be used to train a neural network. This reportexplains the main issues which are faced when neural networks and technicalindicators are combined.

It report is organized as follows: The first section will explains briefly whata trading system is and lay out some common approaches. Thereafter, thetechnical indicators that are used in this study are presented. A first approachwill then attempt to use the back-propagation algorithm to train the neuralnetwork on the financial data. As this approach is inconclusive, an approachbased on genetic algorithm is presented subsequently. This approach raisessome unexpected issues which will be discussed and attempted to be resolved.Finally, it will be concluded on how successfully neural networks and technicalindicators can be used together.

1 Introduction

1.1 What is Technical Analysis?

The use of financial data – such prices, timing and the volume of past transac-tions – to forecast future price evolution is called technical analysis. It can beapplied to different kinds of markets such as stock markets, foreign exchangemarkets and derivatives markets. This analysis is however focused on sharesonly.

The technical analysis view stands in contrast to the so called fundamentalanalysis. Under this view, one uses data about the state of the economy – suchas growth and inflation rates –, but mostly data about the company itself – suchas its financial statement, past dividends, its products, its competitors, and itsstrategy –, in order to predict price evolution.

Both views are in strong opposition to the modern portfolio theory of financeand its Efficient Market Hypothesis. The latter states that prices are essentiallyunpredictable, and that an active portfolio management has no added value.The discussion about which of both schools is right or wrong, or rather in howfar each of both is in somehow right, is left to the reader himself. Yet, theconclusion provides a personal view on efficient markets. In addition, the mostimportant arguments of each school uses to defend its position are presented inthis introduction.

The Efficient Market Hypothesis argues that all publicly available informa-tion is reflected in the price. Under its strong form it even supposes that allprivately available information is immediately reflected in prices, such that eveninsiders cannot take advantage of their privileged information. This is becauserational individuals would immediately buy or sell an asset if an informationthat becomes available would make the current price unjustified.

The arguments technical analysts use to justify their approach is related tobehavioral finance. Technicians argue that market participants are not as ratio-nal as the Efficient Market Hypothesis supposes them to be. Individuals need

2

time to process the bulk of information that can influence asset prices. Infor-mation is therefore not reflected immediately in prices as the Efficient MarketHypothesis states. Moreover, market participants tend to exaggerate the per-ceptions of good and bad news. This might cause the existence of positive andnegative price bubbles, at least in the short run. Alan Greenspan, a formerFederal Reserve Chairman, gave to this phenomenon the name of IrrationalExuberance. Also feeling and past experience play a big role when individualstrade assets. Individuals tend to hesitate when new information is received andto use it only with a certain delay in order to observe reactions of other tradersfirst. Technical analysts argue therefore that markets react only slowly to infor-mation. Moreover, the tendency of adaption to new information can early bedetected in prices and volumes, and the time of adaption is sufficiently long tomake profitable trades.

The technical- and in somehow also the fundamental-analysis views can beintegrated in a trading system. A trading system – or more precisely an algorith-mic trading system – uses an electronic platform generating trading orders infinancial markets based on algorithms and without human intervention. Thosealgorithms use as input data such as prices, timing and the volume of pasttransactions.

The big advantage of trading systems is, that they exclude all emotions fromtrading. Emotions are frequently cited as one of the biggest flaws of individ-ual investors. Decisions are often biased by risk aversion, mood and wronginterpretations of past experiences. By following a pre-defined system, humaninefficiencies can be overcome. The disadvantage of trading systems are thatthey are difficult to develop and require a deep understanding of technical anal-ysis.

1.2 How does it work?

There exist many different approaches to technical analysis. As an example,some look for support and resistance lines. A support line is a price belowwhich a share price rarely goes. If the price approaches such a support fromthe upside, it means that the price is likely to reverse upwards. The opposite istrue for resistance lines, above which a share price rarely goes. Traders can thusmake profits while the share fluctuates between its support and its resistance.Such a system executes many with small but consistent profits.

Other technical analysts try to identify trends, that is longer periods duringwhich the share price moves in the same direction. Of course longer trends arerelatively rare, as most price movements reverse after a while. A trading systembased on trends differs therefore from the previous one by the fact that only afew trades are profitable, but those who are bring large profits.

It is important that a trading strategy is adapted to a specific market state.One distinguishes market states based on two criteria. The first criteria dis-tinguishes stable and trending markets. In opposite to the latter, which hasalready been explained, a stable market is characterized by a share price whichremains within a certain range. The second criteria distinguishes quite andvolatile markets. A quite market is one in which prices remain either withina small range in the case of a stable market, or it trends without any severereplacements or opposite price movements. Volatile markets are the contrary ofthis.

3

Figure 1: The evolution of the IBM share price. One can observe support andbreak lines between which the price oscillates. When a trend begins, supportand resistance lines become obsolete.

To come back to the previous examples, a trading system based on supportand break lines – or more generally the class of so called counter-trend systems– works best in stable and volatile markets, while trend following systems workbest in trending and quiet markets.

So as to make their order decisions, some technical analysts try to detectpatterns in charts. One approach – which is not presented here – makes use ofcandlesticks, a particular representation of price movements. A candle stick iscomposed of a rectangle – the real body – and two lines above and below thisbody – the shadows. The body indicates the open and the close price1. Thecolor of the body indicates which one is the lower. If the body is filled, thestock closes lower than its opening price. When it is hollow the opposite holds.The shadows indicate the level of the highest and the lowest price. The shapeof those candle sticks can then be used to make order decisions.

Another approach uses moving averages (MA) in order to detect trends. Themoving average shows the ‘average’ price over a certain period. One generallycombines a shorter and a longer moving average. The short MA moves closelywith the price, while the longer has more inertia. If the shorter MA is above thelonger, it means generally that the share is in an uptrend. This trend reverseswhen the fast MA moves again below the slow MA.

There exist two types of moving averages over N days. The Simple MovingAverage(SMA) is obtained by computing the average of the price over the lastN days. The Exponential Moving Average is obtained by attributing decreasingweights to the prices situated far in the past. The first value of the EMA mustbe initialized to the (simple) moving average over the N previous days. For the

1The open price is the price at the beginning of the day and the close price is the price atthe end of the day

4

(a) Interpretation of candle-sticks

(b) Candlestick representation for PowerShares QQQtrust

Figure 2: Candlesticks

following days, the EMA is then updated using some weight α:

EMAN =

N∑t=1

Pt

/N (1)

EMAt = αPt + (1− α)St−1 t > N (2)

= αPt + α(1− α)Pt−1 + α(1− α)2Pt−2 (3)

+ · · ·+ α(1− α)t−NEMAN (4)

This average needs a few days in order to converge to its true value and todecrease the influence of the initialization value. Of course, this average doesnot depend on a fixed number of days. It depends on by far more than N days,but more recent prices have a higher influence than days situated far in the past.By convention, one uses a value of the weight given by α = 2

N+1 . In this case86% of the weights are attributed to the N last days.

Both averages move relatively closely together; the only difference being thatthe EMA is more reactive to recent price movements. Figure 3 below shows atrading systems based on moving averages.

A somehow more sophisticated method is based on so called technical indi-cators. Those are indicators based on past prices – mostly even different pricesfor each day, such as open, close, high, and low –, and past trading volumes.Some are even based on moving averages.

2 Technical indicators

In this section, ten different technical indicators will be presented. For each ofthose indicators its interpretation, its computation and a decision rule basedon this indicator will be provided. For the sake of shortness of this report,the indicators are not illustrated in real life examples. The interested readeris refered to http://stockcharts.com/school, which illustrates the indicatorspresented here and even many more. The decision rules which will be used hereare by the way those presented on this website.

5

http://stockcharts.com/school

Figure 3: Trading system based on 30-day and 100-day EMA for Inter-Telecom

2.1 Average Distribution Line (ADL)

The Average Distribution Line is a volume based indicator designed to measurethe cumulative flow of money in an out of a share.

Computation:

1. Money Flow Multiplier = [(Close - Low) - (High - Close)] /(High - Low)

2. Money Flow Volume = Money Flow Multiplier x Volume for the Period

3. ADL = Previous ADL + Current Period’s Money Flow Volume

The Money Flow Multiplier fluctuates between +1 and -1. The multiplier ispositive when the close is in the upper half of the high-low range and negativewhen in the lower half. It is a measure of buying and selling pressure. Thebuying pressure is stronger than selling pressure when prices close in the upperhalf of the day’s range (and vice versa). Combined with the volume, the measureof buying and selling pressure is even reinforced. A high positive multipliercombined with a high volume shows strong buying pressure that pushes theindicator higher. Conversely, a low negative number combined with a highvolume reflects strong selling pressure that pushes the indicator lower.

Decision rule:

• IF ADL > 14-day EMA of ADL THEN buy

• IF ADL < 14-day EMA of ADL THEN sell

6

2.2 Average Distribution Index (ADX)

The Average Distribution index measures the trend strength without regard tothe trend direction. Two other indicators, Plus Directional Indicator (+DI) andMinus Directional Indicator (-DI), complement ADX by defining trend direction.

Computation:

1. Calculate the True Range (TR), which is defined as the greatest of thefollowing:

• Method 1: Current High less the current Low

• Method 2: Current High less the previous Close (absolute value)

• Method 3: Current Low less the previous Close (absolute value)

2. Plus Directional Movement (+DM) and Minus Directional Movement (-DM) for each period.

• Directional movement is positive (plus) when the current high minusthe prior high is greater than the prior low minus the current low.This so-called Plus Directional Movement (+DM) then equals thecurrent high minus the prior high, provided it is positive. A negativevalue would simply be entered as zero.

• Directional movement is negative (minus) when the prior low minusthe current low is greater than the current high minus the prior high.This so-called Minus Directional Movement (-DM) equals the priorlow minus the current low, provided it is positive. A negative valuewould simply be entered as zero.

3. Smooth these periodic values using the Wilder’s smoothing techniques.For instance for the True Range this made as follows:

• First TR14 = Sum of first 14 periods of TR1

• Subsequent Values = Prior TR14 - (Prior TR14/14) + Current TR14

4. Divide the 14-day smoothed Plus Directional Movement (+DM) by the14-day smoothed True Range to find the 14-day Plus Directional Indicator(+DI14). Multiply by 100 to move the decimal point two places.Do thesame for the 14-day smoothed Minus Directional Movement (-DM).

5. The Directional Movement Index (DX) equals the absolute value of +DI14less -DI14 divided by the sum of +DI14 and - DI14.

6. After all these steps, it is time to calculate the Average Directional Index(ADX) as follows:

• First ADX14 = 14 period Average of DX

• Subsequent ADX14 = (Prior ADX14 x 13) + Current DX Value

The Average Directional Index (ADX) is used to measure the strength orweakness of a trend, not the actual direction. The direction of the trend isdefined by +DI and -DI.

7

Decision rule:

• IF ADX > 20 & DI+ > DI- THEN buy

• IF ADX > 15 & DI+ < DI- THEN sell

2.3 Bollinger band (%B)

The %B compares the current price level with an upper and a lower band. Thosebands are the Bollinger bands, which are set 2 standard deviations above andbelow the 20-day simple moving average.

Computation:

1. %B = (Price - Lower Band)/(Upper Band - Lower Band)

%B can be used to identify overbought and oversold situations. A share isoverbought if its price is too high at is likely to decline. The opposite holds foroversold shares.

Decision rule:

• The %B is usually combined with the RSI or the MFI presented here blow.A share is considered overbought if it is situated in the upper 20% betweenthe bands and oversold if situated in the lower 20%.

2.4 Relative Strength Index

The Relative strength Index is an indicator that measures the speed and changeof price movements.

Computation:

1. Compute Average Gain / Average Loss as follows:

• First Average Gain = Sum of Gains over the past 14 periods / 14.

• Subsequent Average Gain = [(previous Average Gain) x 13 + currentGain] / 14.

• Idem for Average Loss

2. RS = Average Gain / Average Loss

3. RSI = 100 100/(1+RS)

The RSI is situated between 0 and 100. It can be combined with the %Bin order to form a trading system. It indicates again whether a share is over-bought or oversold. It is considered oversold if the indicator goes below 0.2 andoverbought above 0.8 (here 0.7 is used to have a more cautious trading rule)

Decision rule:

• IF %B < 0.2 & RSI < 25 THEN buy

• IF %B > 0.7 & RSI> 70 THEN sell

8

2.5 Money Flow Index (MFI)

The indicator that uses both price and volume to measure buying and sellingpressure. It can be considered as a volume weighted form of the RSI.

Computation:

1. Typical Price = (High + Low + Close)/3

2. Raw Money Flow = Typical Price x Volume

3. Positive Money Flow = Sum of positive Raw Money Flow over 14 periods.

4. Negative Money Flow = Sum of negative Raw Money Flow over 14 periods.

5. Money Flow Ratio = (Positive Money Flow)/(Negative Money Flow)

6. Money Flow Index = 100 - 100/(1 + Money Flow Ratio)

The MFI is situated between 0 and 100 and can also be combined with the%B in order to form a trading system.

Decision rule:

• IF %B < 0.2 & MFI < 25 THEN buy

• IF %B > 0.7 & MFI > 70 THEN sell

2.6 Chaiki Money Flow (CMF)

The Chaiki Money Flow combines price and volume to show how money maybe flowing into or out of a stock. It is an alternative to Average DistributionLine, i.e. it is a measure of buying and selling pressure.

Computation:

1. Money Flow Multiplier = [(Close - Low) - (High - Close)] /(High - Low)

2. Money Flow Volume = Money Flow Multiplier x Volume for the Period

3. 20-period CMF = 20-period Sum of Money Flow Volume / 20-period Sumof Volume

The resulting indicator fluctuates above/below the zero line.

Decision rule:

• IF CMF > 0.05 THEN buy

• IF CMF < -0.05 THEN sell

2.7 Force Index

The Force Index uses price and volume to assess the power behind a move andto identify possible turning points.

9

Computation:

1. Force Index(1) = Close (current period) - Close (prior period) x Volume

2. Force Index(13) = 13-period EMA of Force Index(1)

The Force Index combines three elements into one indicator so as to measureselling and buying pressure. First, there is either a positive or negative pricechange. A positive price change signals that buyers were stronger than sellers,while a negative price change signals that sellers were stronger than buyers.Second, there is the extent of the price change, which is simply the current closeless the prior close. The higher the price change, the higher the correspondingpressure. The third and final element is volume,which measures commitment.

Decision rule:

• IF force(13) > 0 THEN buy

• IF force(13) < 0 THEN sell

Typical values for the implementation of the force index use the 13 or 20days EMA.

2.8 Moving Average Convergence-Divergence (MACD)

The MACD is all about convergence and divergence of the two moving averages(one slow and one fast).

Computation:

1. MACD Line: 12-day EMA - 26-day EMA of prices

2. Signal Line: 9-day EMA of MACD Line

3. MACD Histogram: MACD Line - Signal Line

The MACD Line measures the difference between a short and a long movingaverage. Such an indicator would be used in a normal moving average tradingsystem. However, this indicator is too slow to identify soon enough beginningsand reversals of trends. The MACD Histogram measures the convergence anddivergence of the MACD Line. It fluctuates above and below the zero line. Apositive MACD Histogram indicates that that the MACD Line is increasing,which means that one is potentially in an uptrend. An increase of a positiveMACD Histogram indicates that the MACD Line increase is accelerating whichindicates a stronger uptrend. The opposite hold for negative respectively nega-tive decreasing MACD Histograms.

Decision rule:

• IF MACD-hist > 0 THEN buy

• IF MACD-hist < 0 THEN sell

10

2.9 Stochastic Oscillators

The Stochastic Oscillator is an indicator that shows the location of the closerelative to the high-low range over a number of periods.

Computation:

1. %K = (Current Close - Lowest Low)/(Highest High - Lowest Low) * 100

• Lowest Low = lowest low for 14 day look-back period

• Highest High = highest high for 14 day look-back period

2. fast%D = 3-day SMA of %K

3. slow%D = 3-day SMA of fast%D

The Stochastic Oscillators takes values between 0 and 100. It can be used toidentify oversold and overbought shares. A share is overbought if the Stochasticoscillator is above 80 and oversold if below 20. As their names indicate, theslow%D is more smooth than the fast%D. One can base a trading system oneach of both indicators.

Decision rule:

• IF %D moves from below 20 above 20 THEN buy

• IF %D moves from above 80 below 80 THEN sell

3 Implementation and evaluation of basic trad-ing systems

This section will analyze the performance of the trading systems based on thetechnical indicators which have been presented her above. All those systems arebased on the very simple IF-THEN rules. But before, an important additionalfeature and some performance measures will be introduced.

3.1 Stop-loss and take-profit

Two important aspects of trading systems, which have not been presented yetare the stop-loss and take-profit features. As their names indicate, the stop-lossends a position that has been take and has generated too much losses so far,and the take-profit ensures that a position which has generated good profits sofar is sold at a good moment in order to cash-in these profits.

There exist many different types of stop-loss and take-profit and the rightchoice of those features is crucial for the success of a trading system. How-ever, the aim of this project is not to discuss different forms of those feature.Therefore, a simple but still widely implemented solution is used: the so calledtrailing-stop. The trailing-stop assumes the roles of stop-loss and take-profit atthe same time. The trailing-stop sells the position when the current price goesbelow a certain percentage of the maximum price during a position. This per-centage has been chosen equal to 5% as it gave the most satisfying results. This

11

means that the maximum loss that can be made on a trade is 5%. This is thestop-loss function. The take-profit function assures that the profit is cashed-inwhen the share has decreased by 5% from its peak.

The percentage value that has been chosen for the trailing-stop must makethe balance between a trade-off. If its value is too high, the system can incurhuge losses and is hence very risky. Moreover, the take profit is inefficient as thesystem loses a lot of its value compared to the peak which has been attainedduring a trade. On the other hand, if its value is too low, the system will exita position at the first price fall which is observed, even though prices mightcontinue to increase afterwards. The trades a system makes are hence too shortand the potential gain too small to recover its trading costs.

3.2 Money management

Another important aspect that has not been mentioned yet is the money man-agement system. How much to buy is a very crucial question for the success of atrading system. But as the aim of this project is purely academic, a simple rulehas also been used here. At each buy signal, the share is bought for 100 units.This gives an equal chance to each trading decision and is therefore suited forevaluating the trading system. At the same time, all figures such as profit orloss amounts can immediately be interpreted as a percentage. This rule is how-ever very simplified for several reasons. First, one cannot simply buy a sharefor ¤100 or $100 but only for a multiple of its share price, which could evenbe above ¤100 or $100. Second, this assumption does not take into accountthat the capital available for the system might be insufficient especially afterseveral successive losses. What is also not taken into account is the question ofdiversification. It might not be a very smart idea to let a system buy for largeamounts shares of companies operating in the same sector. A very importantprinciple in finance is diversification, that is not to put “all eggs in the samebasket”. However, taking account of those factors would be beyond the scopeof this project.

In order to simplify further – and also because ¤100 or $100 is a too small tobe realistic –, this study does not take into account trading cost either. However,the number of trades will always be reported such that the computation oftrading costs is straightforward.

3.3 Performance measures

In order to compare different trading systems, this section presents some metricsthat help evaluating trading systems. The metrics which are reported here arethe following[11]:

Total net profit examines profitability irrespective of risk taken to achievethe results. It is useful to quickly compare various portfolio componentresults without additional calculations. As the profit must be relativizedwith respect to the duration, the annualized profit will be presented here.If a trading system runs over N days and generates a profit of π whileinvesting each time 100 units, the annualized profit is given by:

annR = (1 +π

100)252/N − 1; (5)

12

The fraction in the exponent represents the inverse of the length of thetrading period in years, where it has been supposed that one year is com-posed of 252 working (trading) days.

Number of trades (# Trades) shows the total number of trades taken duringthe testing period.

Number of days (# Days) shows the average duration of a trade. As withnumber of trades, all else being equal, the lower the number of days in atrade while still generating superior results, the better.

Maximum drawdown amount (Max Draw) tells us the maximum peak-to-valley equity drawdown during the testing period. This number definesour absolute minimum capitalization requirements to trade the system.

Maximum drawdown duration (MDD) is the longest duration of a draw-down in equity prior to the achievement of a new equity peak. This numberis essential in psychologically preparing us for how long we must wait toexperience a new peak in account equity.

Maximum consecutive losses (MCL) is the maximum number of consecu-tive losses endured throughout the testing period. Just as MDD, it isimportant in dispelling any fantasies regarding a system’s ability to jumpcontinuously from equity peak to ever higher peaks, MCL shows ahead oftime exactly how many consecutive losses successful traders would haveendured to enjoy the system’s total net profit.

Profit to maximum drawdown (P:MD) refers to the average profit to maxi-mum drawdown ratio. The higher this ratio is, the better. This is probablythe most important field listed because it allows to examine the profit inrelation to risk endured to achieve that profitability.

Profit loss ratio (P:L ratio) refers to the average profit to average loss ratio.As with P:MD, the higher these numbers are, the better. Trend-followingsystems should have very good P:L ratios because they generally display alow winning percentage of trades. This means that large profits and smalllosses are key in generating a good P:MD ratio. These ratios will dropfor counter-trend systems, but the winning percentage of trades shouldcompensate for this.

Percent winners (%W) is the percentage of winning trades. As stated, trendsystems generally will have relatively low %W’s and counter-trend systemstypically display high%W’s

Time percentage (Time %) refers to the amount of time that this systemhas an open position in the market. If all other fields were equal, then alower time percentage would be preferable because it means our availablecapital is tied up for less time to yield the same rate of return

3.4 Test candidates

To test the different trading systems that will be presented in this report, 30shares from different markets and sectors have been selected.

13

When importing the data, a difficulty has been observed. The price datacontained several break points, where the price falls considerably (see figure 4).Those break points correspond to financial operations such as dividend distri-butions, capital increases etc.. They affect however not the return of the share.To overcome this problem, one must use the adjusted close which eliminatesthose effects. From the difference between the adjusted close and the close, onecould theoretically also adjust the other prices such as open, high and low inorder to be sure that the computed indicators are also “adjusted”. However,one observes also that some volatility is missing in the adjusted price beforethe breakpoint. Therefore, only the last break-free period has been retained toperform the tests.

Figure 4: Evolution of closes price and adjusted close for Target Company. Oneobserves several break points in the close price

Tables 1 and 2 show the characteristics of the test candidates [4]. Reportedare the company code, the complete name, the market and the sector. Reportedare also the market capitalization – which gives an indicator of the size ofthe company –, the number of days for which price data is available and theannualized return of the share during this period. Finally, the beta is alsoreported. The beta is a well known measure for the risk of a share. Moreprecisely it indicates how volatile the returns of a share are compared to thevolatility of the market. A share with a positive beta (which is the case foralmost all companies) has its returns depending on the return of the market.A share with a beta higher than one has stronger volatility in its returns thanthose of the market, while one with a beta between 0 and 1 has lower volatilitythan the market. Mathematically, the beta is given by the slope of an OLSregression between the shares and the markets return.

3.5 Results of testing

Table 3 shows the performance metrics of the different trading systems2. Itincludes one system in addition to those that have been presented in the previoussection. It is a 3-screen system proposed by Dr. Pace. This trading system buys

2More details in the attached Excel file Benchmark.xls

14

Cod

eN

ame

Mark

etS

ecto

rM

kt

Cap

Bet

aan

nR

(%)

#d

ays

AB

TA

bb

ott

Lab

ora-

tori

esN

YS

E100

Hea

lth

Care

94,1

5b

n$

0,3

21,0

63315

AC

KB

Ack

erm

ans

&va

nH

aare

nN

V.

BE

L20

Inve

stm

ent

2,0

9b

n¤

-17,5

2228

AC

NA

ccen

ture

Plc

.N

YS

E100

Con

sult

ing

an

dou

tsou

rcin

g43,8

5b

n$

0,8

12,7

2521

AM

ZN

Am

azon

.com

,In

c.N

YS

E100

On

lin

em

edia

re-

tail

86,4

5b

n$

16,8

93117

AX

PA

mer

ican

Ex-

pre

ssC

omp

any

NY

SE

100

Pay

men

tse

rvic

es66,9

9b

n$

1,8

4-0

,952

2823

BA

XB

axte

rIn

tern

a-ti

onal

Inc.

NY

SE

100

Hea

lth

Care

30,1

4b

n$

0,4

91,0

42558

BE

LG

Bel

gaco

mS

AB

el20

Tel

ecom

mu

nic

ati

on

7,4

8b

n¤

--1

,36

1912

BE

NF

ran

klin

Re-

sou

rces

,In

c.N

YS

E100

Inve

stm

ent

26,5

0b

n$

1,4

86,0

73401

BM

CB

MC

Sof

twar

e,In

c.N

AS

DA

QIT

solu

tion

s6,7

3b

n$

0,6

-2,0

93444

CS

CO

Cis

coS

yst

ems,

Inc.

NA

SD

AQ

Net

work

solu

-ti

on

s107,2

3b

n$

1,1

8-9

,52

2985

DD

E.

I.D

uP

ont

De

Nem

ours

&C

o.N

YS

E100

Ch

emic

als

49,3

0b

n$

1,4

5-2

,75

3557

DE

LB

Del

hai

zeG

rou

pB

el20

Ret

ail

3,6

8b

n¤

-13,4

2229

DE

LL

Del

lIn

c.N

AS

DA

QH

ard

ware

28,4

7b

n$

1,3

6-5

,92

3250

DO

WT

he

Dow

Ch

emi-

cal

Com

pan

yN

YS

E100

Ch

emic

als

42,1

2b

n$

2,3

1-0

,922

2796

FD

XF

edE

xC

orp

ora-

tion

NY

SE

100

Logis

tics

28,2

8b

n$

1,2

62,7

13073

Tab

le1:

Info

rmati

on

ab

ou

tte

stca

nd

idate

share

s(1/2)

15

Cod

eN

ame

Mark

etS

ecto

rM

kt

Cap

Bet

aan

nR

(%)

#d

ays

GE

Gen

eral

Ele

ctri

cC

omp

any

NY

SE

100

Tec

hn

olo

gy

&M

ach

iner

y204,8

5b

n$

1,5

8-1

0,3

2826

GO

OG

Goog

leIn

cN

AS

DA

QIn

tern

etse

arc

h193,8

0b

n$

1,0

813,2

3652

HP

QH

ewle

tt-P

acka

rdC

omp

any

NY

SE

100

Hard

ware

48,4

6b

n$

1,0

9-3

,57

2704

IBM

Inte

rnat

ion

alB

usi

nes

sM

a-ch

ines

Cor

p.

NY

SE

100

ITso

luti

on

s229,5

4b

n$

0,6

61,5

43065

JP

MJP

Mor

gan

Ch

ase

&C

o.N

YS

E100

Fin

an

cial

serv

ices

163,2

8b

n$

1,2

6-4

,37

2802

KB

CK

BC

Gro

epN

V.

BE

L20

Fin

an

cial

serv

ices

4,7

7b

n$

-8,3

12229

KF

TK

raft

Food

sIn

c.N

YS

E100

Food

68,3

0b

n$

0,5

41,0

32545

MC

DM

cDon

ald

’sC

or-

por

atio

nN

YS

E100

Res

tau

rants

97,4

9b

n$

0,4

15,3

83122

MO

BB

Mob

ista

rS

A.

BE

L20

Tel

ecom

mu

nic

ati

on

2,0

0b

n$

-7,0

12229

OR

CL

Ora

cle

Cor

por

a-ti

onN

AS

DA

QIT

solu

tion

s143,6

8b

n$

1,0

82,6

12843

PN

CP

NC

Fin

anci

alS

ervic

esN

YS

E100

Fin

an

cial

serv

ices

34,5

2b

n$

1,2

14,4

54707

SO

LB

Sol

vay

SA

.B

EL

20

Ch

emic

als

7,2

1b

n¤

-2,5

12565

TG

TT

arge

tC

orp

ora-

tion

NY

SE

100

Ret

ail

38,2

9b

n$

0,8

95,9

62775

UP

SU

nit

edP

arce

lS

ervic

eIn

c.N

YS

E100

Logis

tics

76,9

4b

n$

0,8

3-0

,587

2942

XO

ME

xxon

Mob

ilC

orp

orat

ion

NY

SE

100

Oil

402,0

4b

n$

0,4

95,7

62524

Tab

le2:

Info

rmati

on

ab

ou

tte

stca

nd

idate

share

s(2/2)

16

“when the share is oversold and when and the 40-day Moving average is rising”.This has been translated into the following decision rule:

• IF %B < 0.2 & MFI < 25 & EMA40 > 10-day SMA of EMA40 THEN buy

• IF %B > 0.7 & MFI > 70 THEN sell

The trading systems are run over all 30 candidate shares using the trailing-stop system. This means that each trading system has been tested on a totalof about 86’000 days. The table reports the metrics which have been definedabove. Moreover it includes the standard deviation of the annual returns ofthe trading systems, which is an important metric for the risk of the tradingsystems. This standard deviation should be compared to the standard deviationof the annualized market return of the shares, which is at 6.5%

The annual return of the trading system can easily be compared to theaverage annual market return of the shares which is at 2.59% over the studiedperiod. The table states this deviation from the market return. The last columnprovides the p-value of Welch’s student test. This test is used with two sampleshaving potentially different standard deviations. Its test statistic is given by:

t =X1 −X2√s21N1

+s21N1

(6)

where Xi, s2i and Ni are the i-th sample mean, sample variance and sample

size respectively. This statistic follows a t-student distribution with degree offreedom ν given by:

ν =

(s21N1

+s22N2

)2

s41N2

1 · (N1 − 1)+

s42N2

2 · (N2 − 1)

(7)

One can make the following observations: First, all systems under-performthe market. The p-value of the student test confirms that this difference ismostly significant. The best performing trading systems are those based on theoscillators %B and MFI, including the 3-screen system. The 3-screen systemhas been successfully tested by Dr. Pace in combination with a more sophisti-cated stop-loss, take-profit and money management features. We can thereforeconclude that due to our restrictive assumptions on those feature, our trad-ing system must not necessarily beat the market in order to bear a potentialfor being implemented. Improving the cited features will significantly improvethe performance of the trading systems that have and will be presented in thisstudy. With this in mind, one can consider that the trading system based onRSI delivers also good results. The results obtained from ADX and slow%D arealso acceptable.

What is also striking is that the standard deviation is lower for the tradingsystems with good performance. This means that those systems perform rela-tively well on all shares, which further suggest that their success is not simplybased on luck.

17

AnnualReturn(%)

StdDevofAnnReturn(%)

DevfromMktReturn

#Trades

#Days

MaxDraw

MDD

MCL

P:MD

P:L

%Win

%T

P-value1

AD

L-3

,48

9,64

-6,0

425

4,70

6,4

95,9

84,1

012,5

7-4

,44

132,1

033,4

155,3

90,0

07***

AD

X0,

175,

81-2

,39

82,6

012,0

55,9

44,1

38,6

03,5

010,1

835,1

133,3

10,1

42

%B

&R

SI

1,31

2,45

-1,2

511

,80

16,6

95,3

44,6

03,0

337,5

367,3

445,3

96,7

20,3

35

%B

&M

FI

1,74

3,73

-0,8

226

,80

14,9

95,5

32,9

74,4

019,4

128,1

045,4

013,5

60,5

55

Ch

aiki

MF

-2,3

38,

20-4

,89

108,

63

15,0

16,1

64,8

38,3

3-1

,30

-0,7

737,2

852,7

70,0

14**

For

ce13

-2,3

17,

48-4

,87

214,

40

7,3

86,0

43,4

712,8

3-0

,10

0,3

728,8

853,5

30,0

10***

For

ce20

-2,0

17,

02-4

,57

178,

53

8,9

46,1

63,6

012,3

3-0

,06

1,6

028,2

253,4

50,0

12**

MA

CD

-his

t-2

,06

9,47

-4,6

213

9,27

10,5

55,8

83,1

09,4

70,5

91,7

735,4

949,0

60,0

32**

fast

%D

-0,3

94,

47-2

,95

65,0

712,7

55,5

92,7

76,2

31,6

21,7

345,3

328,1

30,0

47**

slow

%D

0,16

4,90

-2,4

047

,30

14,4

35,5

44,3

75,6

34,6

35,3

845,8

823,2

30,1

14

3-sc

reen

1,72

3,73

-0,8

326

,67

15,0

65,5

42,9

34,4

019,5

027,7

145,5

813,5

30,5

48

Tab

le3:

Aver

age

per

form

ance

met

ric

over

30

share

sfo

rth

etr

ad

ing

syst

ems

base

don

sim

ple

IF-THEN

rule

s

a***=

sign

ifica

nt

at

1%

con

fid

ence

,**=

at

5%

,*=

at

10%

18

One could of course further improve the performance of those systems byoptimizing the intervening parameters and even adapt those parameters to thespecific characteristics of each share. A very promising approach would also beto combine the different indicators. In fact, the top three systems implementedso far are those based on two or more indicators. It could in fact be that receivinga signal from one indicator is not sufficient to validate this signal. Potentiallyreceiving many but weak signals from different indicators has more weight thanthan just one strong signal.

Neural networks could take all those options into account. This is the mainfield of interest of this study: Can neural networks help improving the per-formance of trading systems based on technical indicators? The next sectionpresents a simple approach to this question.

4 A simple approach based on Neural Networks

4.1 A brief explanation of Neural Networks

A Neural Network – or more precisely an Artificial Neural Network – is a non-linear classifier inspired from the functioning of the brain. It is composed ofnodes, the neurons, which are interconnected. Each node corresponds to anumerical value and each edge corresponds to a weight that is applied to thevalue from which this edge is coming from.

The variables are entered in the input nodes. The nodes in the intermediatelayer form the so called hidden layer. For each hidden node, a weighted sum ofthe input variable, plus generally a constant corresponding to an input variableequal to 1, is computed. The output of the hidden nodes is given by a Sigmoidfunction applied on this weighted sum. A Sigmoid function is a function withan ‘S’ shape. The Sigmoid function which is considered here is f(x) = 1

1+e−x ,which is represented in figure 5. Mathematically, the output of a hidden layer

j is given by hj = f(∑w

(h)ij xi + w

(h)0j ), where xi are the different inputs, w

(h)ij

corresponds to the weight attributed to the edge going from input node i to the

hidden node j and w(h)0j is the aforementioned constant attributed to the hidden

node.The output of the hidden nodes is hence comprised between 0 and 1. The

output layer does actually the same as the hidden layer. It takes a weightedsum of the hidden nodes plus a constant and enters it in the Sigmoid function.

Mathematically, the output k is given by ok = f(∑w

(o)jk hj + w

(o)0k ), where hj

are the outputs of the different hidden nodes, w(o)jk corresponds to the weight

attributed to the edge going from hidden node j to the output node k and w(h)0j

is the constant attributed to the output node.An example of the Neural Network which will be considered here is repre-

sented in figure 6. The inputs are given by the technical indicators. We havetwo outputs, one buy and one sell signal. We will interpret a signal as valid ifthe corresponding node is above a threshold, which can be arbitrarily defined at

0.5. With this in mind, on can interpret the constant w(o)0k as a kind of threshold.

In fact, for any of both output nodes, the decision rule:

“signal if ok = f(∑

w(o)jk hj + w

(o)0k

)> 0.5”, (8)

19

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

f(x)

Sigmoid function

Figure 5: Sigmoid function

can be translated as:

“signal if∑

w(o)jk hj > −w

(o)0k ”, (9)

Also for the hidden nodes, the constant w(o)0k – or rather −w(o)

0k – can be inter-preted as the threshold above which this node is close to 1, and below which itis close to 0.

%B

MFI

MACD

1

f(x)

f(x)

f(x)

f(x)

1

f(x) Buy

f(x) Sell

Hiddenlayer

Inputlayer

Outputlayer

Figure 6: Layout of the neural network

4.2 How to train a Neural Network?

Training the neural network means determining the weights of the network.This is done via a gradient descent method called back-propagation. We considerhere only the case where the output is binary (0/1). As mentioned, an outputvalue above 0.5 would be interpreted as 1 and below this value as 0. To train thenetwork we need some dataset with the input variables xi and the corresponding,binary variable to predict tk.

The idea of back-propagation is the following. The weights are adaptediteratively. At each iteration one computes first the outputs ok of the network.Those outputs are then compared to the target values tk so as to compute the

20

error of each output node. The goal is to minimize the sum of the squared errors,SSE =

∑k(tk − ok)2. Let w(l), with l = h or l = o, be the matrix containing

the weights of the hidden- respectively output-layer. In order to minimize theSSE, the weights are updates in the opposite direction of the gradient of theerror:

w(l) ← w(l) − η∂∑

k(tk − ok)2

∂w(l), l = h, o (10)

where η is a parameter of the optimization called the learning rate.As the literature about back-propagation is very complete, the develop-

ment of equation (10) is not presented here. Instead, algorithm 1 presentsthe complete back-propagation algorithm. In order to have more compact ex-pressions, matrix notations are used. x, h and o represent – under the form ofa line vector – the values of the input, hidden and output layers respectively.f ′(x) = f(x)(1− f(x)) is the first derivative of the Sigmoid function. The nota-tion ‘·’ is used for the usual matrix product while ‘∗’ represents the elementwiseproduct between matrices of identical size. One can also find here the reasonfor the name of this algorithm: The error is “back-propagated” from one layerto the previous to update the weights.

Algorithm 1 The back-propagation algorithm

while Local optimum not attained doh = f(x ·w(h))o = f(h ·w(o))δout = f ′(h ·w(o)) ∗ (t− o)w(o) ← w(o) + ηh · δoutδhid = f ′(x ·w(h)) ∗ (δout ·w(o))w(h) ← w(h) + ηx · δhid

end while

4.3 The training system

To train the network, we need an optimal target tk for the buy and the selloutput node for every day. For this, we need an optimum benchmark systemwhich gives good signals. The systems which have been presented before are ofcourse not suited for this.

Instead a better system must be developed. For this, the best informationthat could be available is used: future information. This is what will be referredto as the oracle. The oracle gives the 7-days forward simple moving average ofthe price. We have thus an indicator of the future price evolution.

Based on this one can define the following decision rule:

• IF oracle > close + 1.3*stdDev20 THEN buy

• IF oracle < close - 1*stdDev20 THEN sell

This system needs as input also the 20-day standard deviation of the closeprice. The system buys, when the future average price is significantly abovethe current price and sells if it is significantly below. As the price increaseand decrease is significant and located in the near future, it should ideally bereflected in today’s indicators.

21

4.4 Results of the neural network system

We have developed a system which gives reliable training signals. Not surpris-ingly, the oracle performs very well on all 30 stocks. It has an average annualreturn of 22.7% with a standard deviation of 3.4%. It holds the share on averageover 40 days and all trades are profitable.

To train the network one chooses randomly a day in the data set, computesthe output of the oracle and trains the network on this output. As we want tocapture the overall pattern in the data and not the specificities of the last dayson which the network was trained, a small value for the learning rate (η = 0.05)has been chosen. This assures that each day can only very slightly modify theweights of the network. It avoids overfitting the network to the specificties ofeach day. The network is the trained over 10’000 randomly selected days.

The results of this approach are very disappointing. Mots of the time, thenetwork gives no signal over the whole period. On the opposite, during someperiods, it gives both signals at the same time. As only the buy signal counts,when the share is not held and only the sell signal counts when we are in an openposition, this results in alternated buy and sell decisions. Very frequently it iseven observed, that the network gives some signals, which are even profitable,but as learning continues the profit degrades and sometimes the system stopseven to give signals. This holds even for very short periods of only 100 days.

The problem of signal alternation has been partly addressed by an additionalfeature. A buy signal is only considered if at the same time the sell signal issufficiently weak, and vice versa for the sell signal. This decreases the likelihoodthat a signal is reversed the next day. More precisely, a signal is only consideredif it is higher that 0.55, while the other signal is below 0.45.

One can give several reasons why an approach based on back-propagationcan actually not work:

1. The weights of the net are initialized randomly. The problem of backpropagation is that it is a local search algorithm which gets stuck in localoptima. The neighborhood of the global optimum is probably a verynarrow place in the whole search space. It is very unlikely that randominitialization will bring us close to this global optimum. At the same timea network which gives no signal seems to be a local optimum, towardswhich the network converges easily. Making no profit, but also no lossseems very attractive. This explains why states in which some signals areproduced are frequently quit again. It has in fact be observed, that whenthe network is trained on data which includes mostly upturn phases, thatfact that the net gives no signals is less of a problem. Giving no signalin an upmarket is not necessarily optimal. If however, the testing periodincludes up and down phases, giving no signal becomes more attractive.

2. As the oracle gives some signals only when the future market is signifi-cantly above or below the current price, it gives indeed most of the time nosignal. As many other algorithms in machine learning, back propagationhas a bias towards the instance of the output which is the most frequent.In fact, a good fit of the data is given by a classifier which gives no signalat all, as this prediction is true most of the time. This explains why sooften no signal is observed. But even restricting the learning samples to-wards days in which a signal of the oracle was observed does not help to

22

overcome the problem. (see figure 7)

3. Not every small trend the oracle is able to predict, is truly reflected inthe indicators. For example, some beginnings of trends are not precededby a pattern change in volumes. In this case it is non-sense to train ourvolume-based indicators on the signals of the oracle. Another exampleis that indicators are usually designed to detect the beginning of a trend.The network is however also trained during a trend; during which it shouldcontinue to ‘confirm’ the signal. However, only the first signal has realimportance. Training the network during the trend has hence only inferioradded value.

24/09/2004 5/01/2005 19/04/2005 1/08/2005 9/11/2005 24/02/2006 7/06/2006 19/09/2006

18

20

22

24

26

28

30

32

34

36

38

System based on NN for ACN

Date

Pric

e

(a) Trading signals at end of training phase

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−10

−5

0

5

10

15

20

25

nb iterations

Pro

fit

Evolution of profit during learning

(b) Evolution of profitability during learning phase

Figure 7: Results of back propagation on 500 days of Accenture share. Theinput to the network are ADX, DI+ -DI-, %B, MFI and MACD. The networkis trained on dates on which the oracle gives a signal only. The network givessignals which is due to the fact, that the share is mostly in a phase of upturn.However, the signals are useless, as they are very grouped and alternate betweenbuy and sell

One could try to overcome some of these problems, but it is very unlikelythat it is possible to overcome all these problems. The next section presents a

23

more promising method.

5 Neural Network training based on genetic al-gorithms

5.1 Overview of the approach

This section presents the an approach which is likely to overcome the problemsof back-propagation mentioned above. It is based on genetic algorithms.

In order to overcome the problem of local convergence (reason 1), a globaloptimization algorithm must be used. Genetic algorithms are such a globaloptimizer.

We would also like the training method not to depend on the oracle anymore.To achieve this, a global objective function will be defined. It clearly targetswhat we finally want: We want an overall profitable trading system. We donot necessarily want a trading system that absolutely fits the predictions of theoracle. The system should give reliable signals indicating when an upturn beginsand when it ends. As mentioned in reason 3, signals during the trend are lessimportant. The objective function depends only on the first buy signal and thefirst sell signal related to each trade this problem is also solved.

As will be shown, there is still a problem of the number of signals (reason2). But defining an overall objective function can also help to easily addressthis problem (see section 5.3 below) .

5.2 The genetic algorithm

The genetic algorithm is represented in pseudo-code here below in algorithm 2.The population size has been selected equal to 100. Each chromosome rep-

resents the weights of the neural network. It takes the from of the two weightmatrices w(h) and w(o).

The indicators in the input have been normalized, that is subtracted withtheir mean and divided by their standard deviation. This has been done for sev-eral reasons. The indicators that have been defined have very different ranges.Some are positive while others can also take negative values. Some are directlyproportional to the daily volume and take thus very high values (e.g ADL). Oth-ers take values between 0 and 100 (e.g. ADX, MFI or RSI), while others onlybetween 0 and 1 (e.g. %B or Chaiki Money Flow). Normalizing the indicatorsmakes sure that all indicators have the same range. They are situated aroundzero with a unitary standard deviation. The advantage of this is that it pro-vides us also with an idea of the range of the weights. In case this normalizationwould not have been done, indicators which take very high values would havevery low weights and vice-versa. First, it would be difficult for an optimizationalgorithm to obtain a sufficient level of precision at the same time for smalland big weights. Moreover, initializing the weights would be somehow difficult.Another reason relates to the objective function, which will be defined in thefollowing subsection. As it is always done when neural networks are trainedby optimizing an objective function, one always punishes large weights. If theindicators are not normalized, this would simply result in diminishing the im-

24

Algorithm 2 The Genetic algorithm

1: Input: Learning Data (indicators and prices), # indicators(Ni), # hiddennodes(Nh), Population Size (Nc)

2: Output: best of population3: Randomly initialize population4: for Chromosome in population do5: Evaluate fitness Φ of population6: end for7: while last improvement < 75 steps ago && # iterations < 250 do8: mom = select randomly from top χ% in population (χ increasing linearly

from 30% to 90%)9: dad = select randomly from population different from mom

10: offsprings = getCrossoverPopulation(mom, dad)11: With probability 85%:

Evaluate fitness Φ of offspringscandidate = best of offsprings

Otherwise:candidate = select randomly from offsprings

12: offsprings = getMutationPopulation(candidate)13: Evaluate fitness Φ of offsprings14: Replace worse chromosome in population by best chromosome in off-

springs15: Select randomly one chromosome from the 50% worse of the population

and one offspring16: if Chromosmome from offsprings not much worse than that of the popu-

lation then17: Replace chromosome in population by chromosome in offsprings18: end if19: end while

25

portance of indicators which take low values and which need as a consequencelarger weights to influence the output of the network.

Finally, the fact that we normalize is helpful for putting a limit on theabsolute value of the weights. This will be beneficial when it comes to initializingthe chromosomes on the one hand, and for the crossover operator on the otherhand. Consider the decision rule from equation (9). This decision rule is notmodified if one multiplies all weights (including the threshold) by a constant.The interpretation of the threshold for the hidden node will also not be modifiedin this case. One can therefore put an arbitrary limit on the weights. This limithas been chosen here equal to one, which is also the order of magnitude thathas been given to the input3. For the thresholds however, the weight must besomewhat larger. Suppose that all weights are close to 1. In this case, limitingthe threshold to 1 would be too restrictive. A properly defined threshold must

be allowed to go close tho the maximum the sums∑

Niw

(h)ij xi and

∑Nh

w(o)jk hj .

Theoretically, the absolute value of hidden layer thresholds should be boundedto the number of inputs and those of the output layer to the number of hiddennodes; in other words, the number of terms of the sums. In practice this valuecan however be defined lower as not all weights are systematically close to one.The maximum is hence also set also equal to one for the thresholds as this gavethe best results.

To initializes the chromosomes (line 3), the corresponding weights simplytake a random value which is uniformly distributed between -1 and 1. Themaximum and minimum bounds defined this way will henceforth be referred aswmin and wmax.

The genetic algorithm uses a double stop criterion (line 7). One limits thenumber of maximum iterations, the other the number of iterations without anyimprovement of the best chromosome.

Of the two parent which are chosen as candidates, one is taken among theχ best percent of the population (line 8). The percentage χ varies linearly from30% to 90% over the 250 iterations. This assures that at least one parent has asufficiently good, while giving more importance to exploration in the beginningand as the iteration continuous, more importance is given to exploitation ofgood solutions.

The crossover operators (line 10) returns 5 different offspring. Those oper-ators have been inspired from [9]. Let w1 and w2 be the notation for the twoparents to cross-over. The following offsprings (wi

os) have been defined: The firstoffspring is simply the average of the weights of the two parents, w1

os = w1+w2

2 .This operations looks for a solution located in the search space just betweenthe two parents. The second offspring is given by w2

os = (1 − ω)wmax +ωmax(w1,w2). Omega is a constant set equal to ω = 0.8. This operator ex-plores the weights located above the two parents in the search space. The valuesbelow are explored by the third operator w3

os = (1 − ω)wmin + ωmin(w1,w2).Operators four and five perform a one-point crossover for each column of thetwo matrices defining the weights. For reminder, the first matrix represent theweights associated to the hidden nodes. The columns are associated to each hid-den layer, while the lines are associated to an input variable (or the constant).

3The size of the weights is still important for the output of the hidden nodes. But anyway,the limit is not respected strictly. This means that noting in the implementation forbids aweight from taking values above 1. As mentioned, it is only used for initialization and thecrossover operator

26

It is of size Ni + 1×Nh. The second variable represents the output node. Thecolumns are associated to an output, while the lines are associated to the inputof the hidden nodes (or the constant). The cut points are chosen randomly. Theexample here below represents the crossover of the hidden layer matrices of anetwork with three indicators and three hidden nodes (Ni = 3, Nh = 3).

α11 α12 α13

α21 α22 α23

α31 α32 α33

α41 α42 α43

~

β11 β12 β13β21 β22 β23β31 β32 β33β41 β42 β43

=

α11 α12 α13

β21 α22 α23

β31 α32 β33β41 β42 β43

(11)

This operators generates a random combination of the two parents.In 85% of the case, the best chromosome which is generated is chosen as can-

didate wc for mutation (line 11). In the other 15% of the cases, the chromosomeis randomly chosen among the offsprings.

The mutation operator adds some small random perturbations to the weightsof the candidate wc (line 12). Only a few, randomly selected weights are mod-ified by adding a small random constant which might be positive or negative.Seven mutation offspring are defined which all differ by the share of weightswhich are modified on the one hand, and by the size of the random pertur-bations on the other hand. To come back to our previous example, after themutation, the crossover candidate of above might be affected like this:

α11 α12 + ε2 α13

β21 α22 α23

β31 + ε1 α32 β33β41 β42 β43 + ε3

(12)

The best obtained mutant enters then the population by replacing the chro-mosome with the lowest fitness (line 14). A second, randomly chosen mutantreplaces a randomly chosen chromosome in the population (which belongs tothe 50% worse chromosomes in the population) (line 15). This has two aims: inthe beginning of the iteration it helps to rapidly replace the worse chromosomesthat have been initialized. Later on, when the average quality of the popula-tion increases, it helps to keep the population homogeneous and to avoid thealgorithm of being too greedy. This is important to avoid the algorithm to getstuck in a local optimum. Essentially, getting stuck in local optimum is a seriousproblem as we have have already noticed with back propagation, but even withthe genetic algorithm this remains an issue. One can therefore confirm that oneis facing a very complicated optimization problem.

5.3 The objective function

In this subsection the objective function Φ(·) will be defined. This functionevaluates the fitness of the different chromosomes w over a certain learningperiod. For this it requires the (normalized) values of the indicators for eachday and the close prices as input.

A very basic approach would simply reward for the profit and punish forlarge weights. We punish the sum of the squared weights, noted ||w||22. Thisapproach yields however very unacceptable results. The network generated onlya few trades. Generally there is even no sell signal, and the end of the period is

27

used as stop, as an open position is automatically sold. Of course a good profitwhich is generated this way is due only to over-fitting and is not reproducible.

Different solutions had to be considered in order to control the pattern andespecially the number of the signals that were generated. Finally, all of thosesolutions had to be combined to give the best suited results.

It is also important to note that the terms intervening in the objective func-tion had also to be normalized in order to have a predictable order of magnitude.Consider for instance the profit. Logically, larger periods are more likely to gen-erate higher profits than shorter periods. As the learning phase has differentlengths for the various shares, the profit is normalized by the length of theperiod.

In order to force the network to produce a sufficiently high number of tradingsignals. We punish a system which generates “too few” signals. At the sametime, it will be observed that the network might also generate “too many”signals, mostly under the form of alternated buy and sell signals. It has beentried to determine an optimal ratio of the number of signals per day, whichshould be targeted in the objective function. But such attempts failed as thisoptimum ratio varies a lot from one share to another and from one marketcondition to another. A better approach defines a minimum and a maximumratio of signals per day, below, receptively above which the deviation from thosethresholds will be punished. Those threshold ratios have been defined equal to1.5 respectively 4 signals in 100 days. The analytic expression of the punishmentfunction takes the form:

max

[0.015− #signals

#days,

#signals

#days− 0.04, 0

]6(13)

and is represented in figure 8.

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050

0.2

0.4

0.6

0.8

1

1.2x 10

−11

signals/day

puni

shm

ent

Punishment of too few or too many signals

Figure 8: Punsihment function at equation (13)

This takes however not into account that we would like to generate goodprofit with the fewest signals possible. Therefore the number of signals per dayis punished linearly. This is such as taking trading costs into account.

With all this, it is still frequently occurs that buy and sell signals alternate.To make sure that such chromosomes are more efficiently improved or evenrejected, the number of times a buy and a sell signal are less than 4 days awayare also punished linearly.

On potential to further reduce alternating signals and simply control thenumber of signals would be to increase the threshold for considering a signal

28

as valid. As mentioned above a signal is only validated if the correspondingoutput node is above 0.55, while the other signal is below 0.45. Those samethresholds are also used when it come to evaluating the objective function. Onecould imagine to increase the former while decreasing latter threshold whenit comes to validating the system. This way only the strongest, hence mostpromising signals would be retained. However, such an approach resulted onlyin decreased profits as the buy signals were simply delayed by a few days while itdid not substantially reduce the number of (alternating) signals. This solutionhas hence not been retained.

9/07/2007 30/01/2008 22/08/2008 19/03/2009 12/10/2009 7/05/2010 30/11/2010 24/06/2011

25

30

35

40

45

50

55

60

65

System based on NN for ACN

Date

Pric

e

Figure 9: Results of the system on 1’000 days of Accenture share on the trainingperiod. The input to the network are ADX, DI+ -DI-, %B, MFI and MACD. Theobjective function rewards profit, punishes “too few” and “too many” signals,alternating signals and the sum of the squared weights

Figure 9 shows the result of the training method using the objective functionas it has been presented so far. The results are still not satisfying. This is mainlybecause the results that are obtained from the training are very diverse. Somesystems perform very well on the number of signals they generate with onlymodest profit. Others have good profit but still very few signals. To get asystem which preforms very well on both we reward a function which is directlyproportional to the profit and indirectly proportional to the punishment relatedto the signals.

It is also observed, that most training results include very long holding pe-riods, which include also small phases of downturn. It would hence be moreprofitable to split those open positions into smaller peaces which holds theshares only in the up-phases. The trailing-stop defined above is optimal for thispurpose. But when the network is trained using the trailing stop to evaluatethe objective function, it is observed that the network itself does not produceany sell signal anymore. However, the sell signals of the system are very usefulas they might avoid losing each time the 5% loss from the peak required by thetrailing-system to take the profit. The system will therefore be trained using aless strict trailing-system, which sells only after a 10% loss.

29

The final objective function takes hence the form:

Φ(w,p,I) = αprofit

#days− β#signals

#days− ψ − (14)

−γ#alternating signals− δ||w||22 + ν

profit

#days

ξ +#signals

#days+ ψ

where p is the vector of prices for the learning period, I is the matrix of thecorresponding indicators and:

ψ = max

(θ

[0.015− #signals

#days,

#signals

#days− 0.04, 0

])6

α = 20

β = 10

γ = 0.25

δ = 0.5

ν = 0.25

θ = 250

ξ = 0.02

The values of the parameters have been selected such that all term of the func-tion have similar maters of size. Furthermore they have been optimized experi-mentally. For instance the term relating to profit should have more importancethan the control of the number of signals or the size of the weights.

5.4 An additional restart feature

It has been observed that the genetic algorithm (algorithm 2) finds very quicklygood solutions. The large population of 100 individuals is very useful for tworeasons: First, it assures that the population includes from the beginning on atleast a few good chromosomes. At the same time, the population is very hetero-geneous for a long time. This way, the crossover operator can further explorethe very large search space. As the chromosomes in the population get betterand more homogeneous, it takes more iterations until further improvements arefound. Keeping the population heterogeneous is here important for the perfor-mance of the algorithm. As mentioned, this is what the random insertion atline 15 is good for.

It has been observed that going beyond 250 iteration steps is not useful, asthe algorithm seems to stagnate at its best solution it has found so far. Instead,it is more useful to completely restart the algorithm. Finally, the followingsolution is retained: The algorithm as presented in 2 is run 3 times with acompletely randomly initialized population of size 100. Those three trials givegenerally very similar results, while there are still some slight differences acrossthe outcome. As a matter of fact, the objective function defined here abovehas its maximum at a value between 0 and 10 depending on the share ant theperiod. Badly initialized solutions have fitness values which can easily go downto -3’000. Of course those values have no units, no specific interpretation and

30

any monotonic transformation would not modify the outcome of the algorithm.But as the objective function crosses zero during its convergence, one cannotuse relative – so percentage – terms to compare the outcomes of several runs ofthe training algorithms. For comparison, one has hence to use absolute termsand compare this to the total range of the objective function. With this inmind, the outcomes over the different runs differ still by a value of more than 1between the best and the worse outcomes.

Each of the best chromosomes obtained during the three runs are then in-jected three times into a smaller population of size 25 on which the the geneticalgorithm is run again. The goal of this last run is rather to make a local opti-mization of the best solutions obtained so far. The population should hence bemore homogeneous, which explains the smaller population size and the fact thateach good chromosome is even represented three times since the beginning. It isobserved that the already very good chromosomes obtained during the previousruns can still be slightly improved.

Algorithm 3 The Restart algorithm

1: Output: best2: for all i ∈ 1, . . . , 3 do3: besti ← runGA() with Nc = 100 (algo 2)4: end for5: best← runGA() with Nc = 25 containing 3 times each besti (algo 2)

This restart algorithm yields reliably very good solutions. A typical conver-gence curve is shown in figure 10.

0 100 200 300 400 500 600 700 800 900 1000−10

−5

0

5

# iteration

Obj

ectiv

e fu

nctio

n Φ

Figure 10: Convergence of the genetic algorithm using the restart feature. Thered line represents the fitness of the best chromosome in the population. Theblue line represents the average fitness of the population. The black verticallines represent a restart. It corresponds to the training on the Delhaize shareover 1’222 days

31

6 Performance tests of the Neural Network

6.1 Approach

In order to test the approach defined in the previous section, the financial datawhich is available for our test candidates is split into two different sets. Thefirst set is the learning or training set, which is used for training the network.More precisely, it is used to evaluate the objective function. The second set isthe testing or validation set, which is used to evaluate the performance of thenetwork. By splitting the data into two (time-) independent sets, we avoid anyform of overfitting. The testing phase is simple made of the last 1’000 days ofthe data set.

Concerning the structure and the inputs to the network, one must makealready a compromise between the number hidden nodes and variable to includeon the one hand, and on the complexity of the optimization problem on theother hand. The size of the hidden-weight matrix w(h) is given by Ni + 1 ×Nh and that of the output-weight matrix is given by Nh + 1 × No, where Ni,Nh and No are the number of input, hidden and output nodes respectively.For a network with 10 input variables and 7 hidden nodes 93 different weightsmust be estimated. Increasing the search space increases the likelihood thatthe optimization algorithm finds a solution which is more distanced from theoptimum.

For this reason, all indicators that have been defined so far cannot be in-cluded all at the same time. As the optimization problem is computationallyheavy and execution of the performance measures on the 30 shares takes about4 hours it would be too time consuming to test all different combinations of in-dicators. Therefore it is useful to make a pre-selection of the indicators. It wasdecided to include the indicators which gave good solutions when used alonewith their corresponding decision rule (see table 3. Those indicators are theADX , the difference of DI+ and DI-, the %B, the MFI, the MACD histogram,the CMF, the slow %D and the force-20. Moreover the 20-day standard de-viation of the price was included as a measure of risk, as it was observed thatthis is performance increasing. The corresponding structure of the network thatwas analyzed included between 5 and 7 hidden nodes. This way the size of thesearch space was always below 100 weights. Of course the performance of theindicators alone must only be seen as a proxy for the potential of performingwell within the network. As for it, the standard deviation of the price cannotbe used alone; yet it adds some value to the network. The same might happenfor an indicator, which might only be useful in combination with other indica-tors. But as mentioned, testing all different combinations of indicators would betoo time consuming so that the individual performance is the best informationwhich is at our disposition to make the pre-selection.

In order to determine which of the pre-selected indicators should eventuallybe retained, each of the variables was excluded individually. Using this approachit was observed that the best model has 6 hidden layers. Leaving an indicatorout resulted generally in a decline (however insignificant) of the annual returnand mostly even in the increase of the standard deviation of the returns acrossthe different shares, hence in the riskiness of the system.

But unfortunately it has been established that interpreting these results hasactually no sense. The following section explains why an approach based on

32

Objective function Learning phase Testing phaseRun Φ profit #trades ||w||22 profit #trades profit #trades1 2.81 156.3 10 0.92 139.5 25 1.0 222 3.18 185.2 9 1.20 101.9 17 -7.1 173 2.51 154.4 11 0.93 136.2 25 7.2 264 3.01 224.2 9 1.65 126.2 15 -6.4 125 2.38 134.0 10 0.81 112.9 23 -12.1 236 2.80 158.7 11 0.95 168.1 26 -43.4 36

Table 4: Objective function component, learning and testing results over 6independent run on the Delhaize (DELB) share. The training period includes1’222 days, on which the share makes a profit of ¤275’5 (for ¤100 investedinitially) which corresponds to an annualized return of 31.4%. The testingperiod includes 1’001 days, on which the share makes a loss of ¤18.4 (for ¤100invested initially) which corresponds to an annualized return of -5.0%.

technical indicators and neural networks (such as represented here) does notlead to a usable trading system.

For the sake of completeness of this study, an Excel (NN.xls) file is attachedto this report which includes the detailed results of the different experimenta-tion. The following section will also refer to this file when numerical results arecited. The corresponding sheet will me mentioned in the footnote.

6.2 Robustness of the trading system

A good trading system is of course characterized by the fact that the resultsare reproducible. In our case, this means that the results of the trading systemon the trading set on the one hand, and on the learning set on the other hand,are identical over different runs. No trader would use a system whose perfor-mance differs across different trials under exactly the same conditions. Giventhe complexity of our optimization problem, one issue that might arise, is thatthe genetic algorithm is incapable of of finding (a network close to) the globaloptimum at each run.

Table 4 here below shows the results over 6 independent runs of the trainingon the Delhaize share. The table suggests that the concerns about the robustnessof the genetic algorithm were not founded. The restart feature ensures that thevalues take by the objective functions differ only slightly compared to the wholerange of the function’s image. Also on an more detailed level the outcomesof the training are very similar. In fact the table shows the three importantcomponent of the objective function: the profit, the number of trade (to eachcorresponding one buy- and one subsequent sell-signal) and the sum of thesquares of the weights. All these measures are very close across the differentruns.

The results begin to differ when introducing the complete trailing system i.e.stopping at 5% loss instead of the 10% used to evaluate the objective function.This is due to the fact that the trailing-stop gives end to an open positionsearlier than in the objective function. The system might now produce newsignals during a period on which it was in somehow not trained, as any furtherbuy-signal was ignored as the position on the share was open. By the way, this

33

11/02/2003 16/10/2003 23/06/2004 24/02/2005 28/10/2005 11/07/2006 15/03/2007 20/11/200710

20

30

40

50

60

70

80

90System based on NN for DELB

Date

Pric

e

(a) Run 1

11/02/2003 16/10/2003 23/06/2004 24/02/2005 28/10/2005 11/07/2006 15/03/2007 20/11/200710

20

30

40

50

60

70

80


Date

Pric

e

(b) Run 6

Figure 11: Buy- and sell signals over the training period of two independentruns on the Delhaize share

is what explains the increase in the number of trades when the trailing systemis fully activated.

However, those differences are only marginal. Figure 11 shows the tradesof the system on the training period for run 1 and run 6 from table 4. Thosetwo runs had very similar objective functions. One observes only very slightdifferences despite the activation of the trailing system.

When however the system is applied on the testing phase the results differgreatly. This can be observed in table 4 and in figure 12, where again run 1and 6 are compared. The differences are now clearly visible notwithstandingthe very similar performance on the learning set.

We are hence facing the problem that the system is not robust on the testingset. The neural network performs badly on unseen data, or in other words, it

34

20/11/2007 16/06/2008 6/01/2009 29/07/2009 17/02/2010 8/09/2010 28/03/2011 17/10/2011

30

35

40

45

50

55

60

65

70

75


Date

Pric

e

(a) Run 1

20/11/2007 16/06/2008 6/01/2009 29/07/2009 17/02/2010 8/09/2010 28/03/2011 17/10/2011

30

35

40

45

50

55

60

65

70

75


Date

Pric

e

(b) Run 6

Figure 12: Buy- and sell signals over the testing period of two independent runson the Delhaize share

35

cannot be generalized.This lack of robustness of the trading system – which has until now been

exemplified for the Delhaize share –, is of course also valid for the other testcandidates. An intuitive way to express the lack of reproducible results is thecorrelation between the profits on the shares over different runs. The averageof the pairwise correlation of the annualized profit over three independent runson all 30 shares on the learning set is at about 97% when the “reduced” trail-ing system of the objective function is used . Introducing the complete trailingsystem lets this correlation drop slightly to about 93%. The correlation canintuitively be interpreted as a measure of how confident one can be that theperformance measures of the system (here the profit) will reproduce over dif-ferent runs. When the system is the applied on the testing set, this confidencemeasure drops then to about 60%.4

This illustrates – now on a global level – the problem of this approach. Andeven though a correlation of 60% might still sound reasonable, it makes thetrading system completely useless in practice. Across the different runs thathave been performed, it was always observed that a few shares perform verybad, with annual returns close to -20%. However, these bad results are noteach time observed on the same shares. Hence it cannot be interpreted that thesystem, or more generally the approach, is not adapted to some of the sharesand which could simply be excluded from the system. Instead, on which sharesthe bad performance will be observed cannot be predicted in advance. A traderusing this system, will know in advance that the system will generate big losseson some of the shares. This makes the system unusable from a psychologicalpoint of view. The trader would stop the system on shares on which he observesthat the system starts generating losses. By doing so, he might in fact excludea share on which the system would have generated a profit in the end.

The absence of reproducible results on the testing set can be explained asfollows: There are actually many combinations of weights which correspond to(a configuration close to) the global optimum. In fact, as observed in table 4,on average only 10 trades – that is 10 buy- and 10 sell signals – where generatedover the training period. One can understand, that in a neural net with closeto 100 weights, many different configurations can generate the same signals.

As the signals are generated by passing the thresholds defined for the twonodes, slightly modifying the weights affects the values of the two output nodes,but not necessarily the signals that are generated. Differently expressed, itmeans that the global optimum does not correspond to some precise values, butrather to some (interdependent) intervals on which the weights must be located.This means that the search space must be very flat around this global optimum.As the signals which are generated are not affected when the weights are slightlymodified, the objective function is also not affected.

Unfortunately it is not possible to represent the whole search space as it hasa dimension close to 100. However, figure 13 illustrated very well the flatness ofthe search space around the global optimum. It shows the value of the objectivefunction on the training set when two of its weights are modified around thevalue which has been obtained by the optimization algorithm. More precisely,it corresponds to the weights returned by run 2 of table 4, the run with thehighest objective function value. The weights that are modified should ideally

4NN.xlsx>>full.Nh6

36

−0.1−0.05

00.05

0.1

−0.1

−0.05

0

0.05

0.11

1.5

2

2.5

3

3.5

δ w%B

Search space

δ wMFI

Obj

ctiv

e fu

nctio

n Φ

Figure 13: run 2

be weights that have indeed an impact on the output of the network. Manyweights in the network take in fact very small values, which simply indicates,that the corresponding input is in fact not taken into consideration. Slightlymodifying those weights would indeed not affect the signals generated by thenetwork. To avoid this, two inputs have been chosen which should determinethe output of the net. Those inputs are the %B and the MFI, the indicatorswhich performed the best individually. The precise weights which are modifiedare those corresponding to the highest absolute value of the weights of thehidden layer corresponding to those inputs. The global optimum which hasbeen returned by the learning algorithm corresponds of course to no variationof the weights (δw%B

= δwMFI= 0).

It can indeed be observed that the search space is completely flat aroundthe global optimum. The objective function takes the form of “stairs”. Oneach plateau, the signals which are generated are identical. When the objectivefunction drops, it simply means that the signals are modified.

As the global optimum does not correspond to an exact configuration of theweights, the network obtained over the different runs was indeed not that same.This is what explains, why the network resulting from different learning runshad a similar behavior on the training data, and behaved still differently on thetesting data.

6.3 Potential remedies against the low robustness

This subsection presents several attempts to overcome the problem encounteredwith the here presented approach.

6.3.1 Grouped training

The major cause of the lack of robustness is, as mentioned, that only a fewsignals determine the value of the objective function and that the search spaceis too flat at the optimal configuration. One obvious remedy against this, wouldbe to train the network on a longer period. This way, more signals are generated

37

during the learning phase, which would also make the configurations correspond-ing to the optimum more narrow. In other words, the likelihood that the signalsover the learning period are modified when the weights are modified increaseswhen the length of the learning phase increases. On average, our learning phaseincludes 1’825 trading days, which corresponds to about 7 years of data. Theproblem with taking longer learning periods, is that the mirco-structure of themarkets changes over such long periods. Over the last decade the liquidity andthe volatility of financial markets have changed fundamentally. Training thenetwork on very old data would make it unadapted to running it on the testingdata or applying it to the market.

Yet, the length of the training period can be extended in another way. Anetwork could be trained for several shares at the same time. The trainingperiod length becomes the sum of the training period lengths of the differentshares. Of course, training the network on several shares makes it less specificto the characteristics of the individual shares. Therefore the shares should byitself have similar characteristics. On the other hand, training on several sharesis not necessarily bad. It might reduce the problem of over-fitting the networkto the data. It might for instance be that the training period of one of theshares includes a very rare event ( e.g. an accident in the factory). Training thenetwork only on the share would adapt the network to the specific pattern ofthis event, which is not desired as such a rare event is not likely to be observedagain in the future. Training the network on several shares reduces the risk ofover-fitting the data to specific pattern in the leaning set.

In order to have shares with still some similar characteristics, different groupsare formed according to sectors. This is justified by the fact that shares in thesame sector are frequently correlated and have the same risk. For instance, theβ measure of risk introduces above is generally computed on a sector level. Thegroups are the following:

IT solutions(5): Accenture Plc. (ACN), BMC Software Inc. (BMC), Inter-national Business Machines Corp. (IBM), Oracle Corporation (ORCL),Cisco Systems Inc. (CSCO)

(Petro-)Chemicals(4): E. I. Du Pont De Nemours & Co. (DD), The DowChemical Company (DOW), Solvay SA. (SOLB), Exxon Mobil Corpora-tion (XOM)

Financial services(4): JPMorgan Chase & Co.(JPM), KBC Groep NV (KBC),PNC Financial Services (PNC), American Express Company(AXP)

Retail & Food(4): Delhaize Group (DELB), Target Corporation (TGT), KraftFoods Inc. (KFT), McDonald’s Corporation (MCD)

Hardware & Technology(3): Dell Inc. (DELL), General Electric Company(GE), Hewlett-Packard Company (HPQ)

Telecommunications(2): Belgacom SA. (BELG), Mobistar SA. (MOBB)

Internet(2): Amazon.com Inc. (AMZN), Google Inc (GOOG)

Logistics(2): FedEx Corporation (FDX), United Parcel Service Inc. (UPS)

Health Care(2): Abbott Laboratories (ABT), Baxter International Inc. (BAX)

38

Investment(2): Ackermans & van Haaren NV. (ACKB), Franklin ResourcesInc. (BEN)

The objective function Φ is simply composed of the individual objectivefunctions Φi such as defined in equation (14). On the one hand, the individualobjective function should be as high as possible. Therefore, we reward simplythe sum of all individual objective functions. On the other hand, the networkshould perform reasonably well on all shares at the same time during the learningphase. It should not be possible to compensate bad performance on some sharesby very good performance on other. Usually a product of the objective functionsshould be included. But as the objective function takes values slightly above orbelow 0 this direct approach would not work. Its can be considered, that thenetwork starts to perform well from the moment on that the objective functionpasses above -10. Therefore a product is included whose factors are the objectivefunctions augmented by 10 if this sum is positive. Otherwise its is consideredas zero. Finally the global objective function is:

Φ =∑i

Φi +∏i

max(Φi + 10, 0) (15)

It was observed, that grouping two shares together is not sufficient to havethe desired effect. In fact, the correlation are similar to those observed on indi-vidual shares (i.e 96% on the training set with full trailing system and around50% on the testing set). But when the groups get larger, the problem of ro-bustness appears now even on the training set. Despite, the second term in theobjective function (15), the system is now incapable of reproducing the sameresults over the training set. In fact, the values of the individual objective func-tions differ over the different runs. It is not possible to guarantee a reproducibleoutcome of the learning algorithm when several shares are included. The cor-relation over different runs on the shares which are in groups including at leastfour shares has dropped to 50% on the learning set. The problem is that it is ex-tremely difficult to find a weight configuration that performs well on all shares.The initialized chromosomes perform mostly very badly on most of the shares.With so many bad chromosomes in the population, the genetic algorithm has alot of difficulties of increasing the average fitness of the population, even whenthe parent selection was made among the best 20% of the population. Of courseadditional features such as increasing the number of iterations or adding morerestarts could possibly help to overcome these algorithmic problems. But theproblem is more likely to be located elsewhere. The fact that the training withgrouped shares is so difficult indicates that having one network for more than 3shares is not adequate despite the fact that those hares are in the same sector.It seems that still every share has its own specific pattern. This approach ishence also miss-leading.5

6.3.2 Control via the objective function

It has also been verified, that the problem of low robustness is not due to a badlydefined objective function. The term ||w||22, which is the punishment of the sumof the squares of the weights, is important for the resulting configuration of the

5NN.xlsx>>Grouped

39

network. Its role is to keep the values of the weights as small as possible. Itinduces also, that inputs which are not taken into account for a specific (hiddenor output) node have a small value. More precisely, the outputs of hidden nodesare in fact a non-linear combination of the indicators. But as not all indicatorsare eventually important for the specific node, some of the weights should haveweights close to zero. One issue that might arise is that an indicator which is“unused” during the learning set is still sufficiently high to affect the signals ofthe neural net during the testing phase. Such noise signals should indeed beavoided by introducing the punishment of the size of the weights. As matterof facts, it was observed that when the factor δ associated to the term ||w||22was decreased by only one matter of size (×10−1 ), the correlation between theannualized profits over several runs on the testing set dropped down to 0% onaverage (while the correlation on the learning set where unaffected) which meanssimply that the performance on the testing set became completely random6. Thepunishment of ||w||22 is hence indeed important for the robustness of the system,but it does not help to overcome the problem as it does not address the causestepping from the search-space flatness around the optimum.

This raised the idea of another way to control the structure of the network– and possibly its robustness – via the objective function. The punishment ofthe sum of the squared weights (||w||22) is in fact an L2-measure of the size ofthe weights. The corresponding L1-measure is the sum of the absolute valueof the weights, ||w||1. The difference between those two measures is that theL2 is more influenced by the highest weights in the net, while the L1 treats allmatters of sizes equally. Introducing the L1-measure in the objective function isexpected to have the following beneficial effect: As the L1 does not only focuson the highest weights, it is expected to decrease further the weights of theaforementioned “unused” weights of the network, and this way it would furtherreduce the “noise” signals appearing during the testing phase. 7

The objective function (14) was thus modified to include the follow terms:

Φ(w,p,I) = · · · − δ1||w||1 − δ2||w||22 (16)

where:

δ1 = 0.05

δ2 = 0.25

As expected, it was observed, that this had the effect of further approachingthe value of the smallest weights to zero by one matter of size (×10−1). Butagain the robustness of the system was not affected as the problem of searchspace flatness was not solved.

6.3.3 Complexity of the network

A last approach that has been studied is the complexity of the network. Ithas been tested whether including less indicators in the input and reducing thenumbers of hidden nodes had an impact on the robustness of the system. Twosystems have been considered which include the mostly promising indicators (on

6NN.xlsx>>L2reduced7NN.xlsx>>L1

40

order to avoid noise from bad indicators): A first system is based on the %B,the MFI and the MACD histogram8 . The second is based on the ADX, thedifference between DI+ and DI-,the %B, the MFI and the slow%D9. Both used3 hidden nodes. This means that the number of nodes has been reduced frompreviously 74 weights to now 20 respectively 26 weights. Yet, the problem ofrobustness – as expressed by the correlations – remains identical.

7 Conclusion

7.1 Why technical indicators and neural networks do notwork together

Unfortunately we must conclude that the approach which is defined here did notlead to a usable trading system. One reason has been explicated here before.To conclude this report, some other argument will be added to explain whytechnical indicators will probably not work together:

1. As mentioned here before, the system which was obtained is not robustand can hence not be used as such for psychological reasons. All attemptsto overcome this problem failed. Even though the result on the trainingset cannot be interpreted as a consequence, it should bet mentioned thatnone of the systems significantly bet the market.

2. The system is also unusable for another psychological reason. Throughthe complexity of the system, it is in fact very untransparent. One cannotknow on what precisely the trading signals are based. No trader woulduse a system which he does not understand.

3. Lastly, the approach which had to be used it quite complicated and heavy.On the one hand, the objective function that was necessary to obtainacceptable training results is not an easy expression which includes manyparameters. On the other hand, the computational complexity of thegenetic algorithm is considerable. It needs many iterations with manyevaluations of the also very heavy objective functions to converge to theglobal optimum. As mentioned, testing the system on all 30 shares takesabout 4 hours on a modern computer10. In addition at least two runs areneeded to verify the robustness. Optimizing the parameters, the structureof the network and the indicators to be considers is hence a very timeconsuming task. The big problem is also that the usually very efficientback propagation algorithm cannot be used.

Of course, this report should not be considered as a “proof” that one cannotbase a system on indicators and neural nets. Proving this impossibility is ofcourse not possible in itself. Rather, this report shows the difficulties of such anapproach. It has shown that a simple back-propagation algorithm can probablynot be applied. It proposes an idea of an objective function used to overcome theflaws of the back-propagation approach and provides a well performing genetic

8NN.xlsx>>simple19NN.xlsx>>simple2

10Intel i5 with 2.3GHz and 4GB RAM; coded in Matlab

41

algorithm to solve the global optimization. It raised also the issue of robustnesssuch a system can have. The most promising method to overcome this problemseems to be to group several shares together. So as to implement a systembased on neural networks and technical indicators additional research could bedone in order to determine under which conditions one network can be be usedfor several shares. This report indicates that being in the same sector is notsufficient.

Another promising approach might be how eventually combining differentnetworks – more precisely bagging, which is used when a classifier is unstable –might improve the robustness. Possibly combining networks which have beenobtained over several runs of the algorithm could be aggregated to form onetrading system. The goal of further research would the be to determine howthose networks should be aggregated and whether this can help to overcome theproblem of robustness.

However, the chances of success seem very low. If one is absolutely keen inusing neural network and technical indicators one should rather refer to an ap-proach which is largely developed in the literature [2, 8, 13]. This approach useslagged prices as input and has the predicted price(s) over a very short periodsas output. This approach has two fundamental advantages over the approachpresented here. First, as the predicted prices are observed during the trainingphase, back-propagation can be used here. This makes the training on the onehand computationally more efficient and will also overcome the problem of ro-bustness, as the correctness of the predictions can be assess for each day (whichincreases the “precision‘” of the obtained network) instead of just aggregatingthe signals of the whole period into one very complex and global objective func-tion. Second, such a system would also overcome psychological issues. Thepredicted price over a very short period is more tangible than some abstractbuy- and sell-indicators. Technical indicators can be used to complement theinput of such a neural network [6, 7].

7.2 A personal comment on efficient markets

One question which remains still open relates to who is right: Fundamentalistsand technical analysts, or the theory about efficient markets? Are stock pricespredictable or purely random?

I think these views are not necessarily mutually exclusive. Before startingthis project I did not believe that technical indicators convey information aboutfuture prices. What made me put this view into doubt was related to an ob-servation I have made: During the development and improvement phase of thegenetic algorithm I observed a correlation of 50% between the performance onthe training set and that on the testing11. This is very abnormal as the marketperformance in one period is not related to the market performance in anotherperiod in the long run. The correlation was related to the fact that the geneticalgorithm was not yet fully developed and that as a consequence not all net-works where correctly trained, as they got stuck in local optima for instance.Explained differently, badly taking into account the technical indicators resultedin performance which was by far worse than the market. As the genetic algo-rithm became fully developed so that the training an all shares was successful

11NN.xlsx>>badTraining

42

and not random anymore, this correlation disappeared naturally. This confirmsthat there is indeed some information in the indicators.

On the other hand, the Efficient Market Hypothesis should not be over in-terpreted. To me, the Efficient Market Hypothesis does not mean that pricesare completely random. Rather it should be reformulated as: “There is nofree-lunch”. The question is hence how complicated and how successful the ex-traction of the underlying pattern is. With this in mind, one should rather speakabout the reward of a trading system rather than its return, notwithstandingthat both terms are frequently used as synonyms. One should also take intoaccount the costs and the effort of developing a trading system. Essentially, de-veloping a trading system is very complicated and requires extensive knowledgeof the tools which are used for this. In my opinion, a trading system mightbeat the return of the market. But will it also reward its developer more thanif he had put its effort or applied its knowledge elsewhere? One can cite here astudy published in the Journal of Finance in 2000 [12]. In this article, Wermersshows that Mutual Funds are able to beat the stock market by 1.3%. However,1.6% must be subtracted from this for trading costs and other expenses. Putdifferently, the experts are able to beat the returns of the market in a way whichbarely rewards them for their costs.

43

References

[1] John Ehlers and Ric Way. Evaluating trading systems. www.mesasoftware.com.

[2] Fernando Fernandez-Rodrıguez, Christian Gonzalez-Martel, and SimonSosvilla-Rivero. On the profitability of technical trading rules based onartificial neural networks: Evidence from the Madrid stock market. Eco-nomics Letters, 2000.

[3] Lonnie Hamm, B. Wade Brorsen, and Martin T. Hagan. Comparison ofstochastic global optimization methods to estimate neural network weights.Neural Processing Letters, 26(3), 2007.

[4] Google Inc. Google finance. www.google.com/finance.

[5] Investopedia. Basics of trading systems, 2006.

[6] Kyoung jae Kim and Ingoo Han. Genetic algorithms approach to featurediscretization in artificial neural networks for the prediction of stock priceindex. Expert Systems with Applications, 19(2), August 2000.

[7] Monica Lam. Neural network techniques for financial performance pre-diction: integrating fundamental and technical analysis. Decision SupportSystems, 37(4), 2004.

[8] William Leigh, Russell Purvis, and James M. Ragusa. Forecasting theNYSE composite index with technical analysis, pattern recognizer, neuralnetwork, and genetic algorithm: a case study in romantic decision support.Decision Support Systems, 32, 2002.

[9] Frank H. F. Leung, H. K. Lam, S. H. Ling, and Peter K. S. Tam. Tuningof the structure and parameters of a neural network using an improvedgenetic algorithm. IEEE Transactions on Neural Networks, 14(1), January2003.

[10] David J. Montana and Lawrence Davis. Training feedforward neural net-works using genetic algorithms. Proceedings of the 11th international jointconference on Artificial intelligence, 1989.

[11] Michele Pace. Algorithmic trading project notes & guidlines, 2012.

[12] Russ Wermers. Mutual fund performance: An empirical decomposition intostock-picking talent, style, transactions costs, and expenses. The Journalof Finance, 55(4), 2000.

[13] Jingtao Yao, Chew Lim Tan, and Hean-Lee Poh. Neural networks fortechnical analysis: A study on KLCI. International Journal of Theoreticaland Applied Finance, 2(2), May 1999.

44

www.mesasoftware.com

www.mesasoftware.com

www.google.com/finance

a trading system based on ann

Documents

neural network system

neural network training

training system

setupa trading system

whata trading system

technical analysis view

trading system336

simple approach