forecasting and trading the eur/usd exchange rate with gene expression and psi sigma neural networks

Expert Systems with Applications 39 (2012) 8865–8877

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Forecasting and trading the EUR/USD exchange rate with Gene Expressionand Psi Sigma Neural Networks

Georgios Sermpinis a,⇑, Jason Laws b, Andreas Karathanasopoulos c, Christian L. Dunis d

a University of Glasgow Business School, University of Glasgow, Gilbert Scott Building, Glasgow G12 8QQ, United Kingdomb University of Liverpool Management School, The University of Liverpool, Chatham Street, Liverpool L69 7ZH, United Kingdomc London Metropolitan Business School, London Metropolitan University, London NE7 8DB, United Kingdomd Liverpool Business School, JMU, John Foster Building, 98 Mount Pleasant, Liverpool L3 5UZ, United Kingdom

a r t i c l e i n f o

Keywords:Genetic ExpressionPsi Sigma NetworksRecurrent networksMulti-Layer Perceptron networksQuantitative trading strategiesGenetic programming

0957-4174/$ - see front matter � 2012 Elsevier Ltd. Adoi:10.1016/j.eswa.2012.02.022

⇑ Corresponding author.E-mail addresses: [email protected]

liverpool.ac.uk (J. Laws), a.karathanasopoulos@londopoulos), [email protected] (C.L. Dunis).

a b s t r a c t

The motivation for this paper is to investigate the use of two promising classes of artificial intelligencemodels, the Psi Sigma Neural Network (PSI) and the Gene Expression algorithm (GEP), when applied tothe task of forecasting and trading the EUR/USD exchange rate. This is done by benchmarking their resultswith a Multi-Layer Perceptron (MLP), a Recurrent Neural Network (RNN), a genetic programming algo-rithm (GP), an autoregressive moving average model (ARMA) plus a naïve strategy. We also examine ifthe introduction of a time-varying leverage strategy can improve the trading performance of our models.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Neural networks and genetic programming algorithms areemergent technologies with an increasing number of real-worldapplications including finance (Lisboa & Vellido, 2000; Chen,2002). However their numerous limitations and contradictoryempirical evidence around their forecasting power are often creat-ing scepticism about their use among practitioners. This scepticismis further fuelled by the fact that the selection of each algorithm in-puts is based more on trial and error and the practitioner’s marketknowledge rather than on some formal statistical procedure.

The motivation for this paper is to investigate the financial fore-casting performance of two promising classes of artificial intelli-gence models, the Psi Sigma (PSI) Neural Network and the GeneExpression algorithm (GEP), which try to overcome some of theselimitations and have provided some promising empirical evidencein pattern recognition in other fields of science. This is done bybenchmarking their trading performance over the EUR/USD dailyfixing by the ECB with a Multi-Layer Perceptron (MLP), a RecurrentNeural Network (RNN), a genetic programming algorithm (GP), anautoregressive moving average model (ARMA) and a naïve tradingstrategy. Moreover, we explore if the application of a time-varyingleverage strategy based on Risk Metrics volatility forecasts canincrease the trading efficiency of our models.

The main reason behind our decision to use the EUR/USD dailyfixing by the ECB is that it is possible to leave orders with a bank

ll rights reserved.

.uk (G. Sermpinis), [email protected] (A. Karathanaso-

and trade on that basis. It is therefore a tradable quantity whichmakes our trading simulation more realistic.

As it turns out, both the Psi Sigma and the Genetic Expressionalgorithms do remarkably well and outperform all other modelsbefore and after the application of the trading strategy. It seemsthat their unique architectures allow them to outperform in ourforecasting competition compared with the more ‘traditional’MLP, RNN and GP models. This empirical evidence can be addedto the limited bibliography on the superiority of GEP and Psi Sigmain pattern recognition in Science more generally.

We also note that after the application of time-varying leverage,all our models except ARMA demonstrate a substantial increase intheir trading performance in terms of annualised return.

The rest of the paper is organised as follows. In Section 2, wepresent the limited extant literature relevant to the Psi Sigmaand the Genetic Expression algorithm. Section 3 describes the data-set used for this research and its characteristics. An overview of thedifferent neural network models, genetic programming algorithmsand statistical techniques is given in Section 4. Section 5 gives theempirical results of all the models considered and investigates thepossibility of improving their performance with the application asophisticated trading strategy while Section 6 provides some con-cluding remarks.

2. Literature review

The motivation for this paper is to apply two of the most prom-ising artificial intelligence models, the Psi Sigma Neural Networkand the Gene Expression algorithm (GEP) which have beendeveloped recently with the purpose to overcome the numerous

http://dx.doi.org/10.1016/j.eswa.2012.02.022

mailto:[email protected]





http://dx.doi.org/10.1016/j.eswa.2012.02.022

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

Table 1The EUR/USD dataset.

Name of period Beginning End

Total dataset 2273 3 September, 2001 31 August, 2010Training dataset 1893 3 September, 2001 27 February, 2009Out-of-sample dataset

[Validation set]380 3 March, 2009 31 August, 2010

3 EUR/USD is quoted as the number of USD per Euro: for example, a value of 1.2657is USD1.2657 per 1 Euro. We examine the EUR/USD from 3 September 2001 until 31August 2010.

4 For a full discussion of the statistics refer to Bera and Jarque (1981).

8866 G. Sermpinis et al. / Expert Systems with Applications 39 (2012) 8865–8877

limitations of more classic neural and genetic programming archi-tectures and to assess whether they can achieve a higher perfor-mance in a trading simulation of the EUR/USD ECB fixingexchange rate. Although the unique architecture and characteris-tics of both Psi Sigma and Gene Expression seem promising in pat-tern recognition, there is only little empirical evidence regardingtheir forecasting power.

Psi Sigma networks were first introduced by Shin and Ghosh(1991) as a neural architecture capable of capturing higher ordercorrelations within the data while avoiding some of the NeuralNetworks (NNs) limitations such as the combinatorial increasein weight numbers. Shin and Ghosh (1991) and Ghosh and Shin(1992) demonstrate these benefits and present empirical evi-dence on their forecasting superiority in function approximationwhen compared with a MLP network and a Higher Order NeuralNetwork (HONN). Ghazali, Hussain, and Merabti (2006) comparethem with a HONN and a MLP network on the task of forecastingand trading the IBM common stock closing price and the US 10-year government bond series. Psi Sigma outperformed bothbenchmarks in terms of statistical accuracy and annualised re-turn. In a similar paper, Hussain, Ghazali, and Al-Jumeily (2006)present satisfactory results of the Psi Sigma forecasting poweron the EUR/USD, the EUR/GBP and the EUR/JPY exchange rateshaving as benchmarks a HONN model. On the other hand, Dunis,Laws, and Sermpinis (2011) who also study the EUR/USD serieswith Psi Sigma having as benchmarks MLP, RNN and HONNarchitectures failed to outperform their benchmarks in a simpletrading application.

GEP was first introduced by Ferreira (2001). Ferreira (2001,2006) concludes saying that GEP is the latest addition to a familyof evolutionary algorithms that provides financial practitionerswith a further insight into artificial intelligence remedying theshortfalls attributed to traditional Genetic Algorithms (GAs) andgenetic programming algorithms (GP). In comparison with GAand GP applications, the GEP superiority lies in the fact that itclearly distinguishes the differences between the genotype1 andthe phenotype2 of individuals within a population. For instance,whilst a traditional GA classifies individuals as symbolic strings offixed size (i.e. chromosomes) and GP classifies its individuals asnon-linear entities comprising of different shapes and sizes (tree likestructures); the GEP encompasses a combination of both. HenceFerreira (2001) stresses that GEP represents not only an individual’sgenotype, in the form of chromosomes, but also its phenotype as atree like structure of expressions in order to establish fitness. GEPis a new evolutionary technique and its applications are quite lim-ited: so far, it has been successfully applied in problems in miningand computing (Dehuri & Cho, 2008; Lopez & Weinert, 2004; Margny& El-Semman, 2005).

3. The EUR/USD exchange rate and related financial data

The European Central Bank (ECB) publishes a daily fixing for se-lected EUR exchange rates: these reference mid-rates are based ona daily concentration procedure between central banks within andoutside the European System of Central Banks, which normallytakes place at 2.15 p.m. ECB time. The reference exchange ratesare published both by electronic market information providersand on the ECB’s website shortly after the concentration procedurehas been completed. Although only a reference rate, many financialinstitutions are ready to trade at the EUR fixing and it is therefore

1 The genotype is an individual’s full hereditary information representing its exactgenetic makeup.

2 The phenotypes are the observed properties of an individual such as developmentand behaviour. These are largely determined from an individual’s genotype.

possible to leave orders with a bank for business to be transacted atthis level.

The ECB daily fixing of the EUR/USD is therefore a tradable levelwhich makes using it a more realistic alternative to, say, Londonclosing prices and this is the series that we investigate in this pa-per.3 We examine the ECB daily fixing of the EUR/USD from 3 Sep-tember 2001 until 31 August 2010. The data period is partitionedas in Table 1.

Fig. 1 shows the total dataset for the EUR/USD and its volatiletrend since early 2008.

The observed EUR/USD time series is non-normal (the Jarque–Bera4 statistics confirms this at the 99% confidence level) containingslight skewness and high kurtosis. It is also nonstationary and hencewe decided to transform the EUR/USD series into a stationary dailyseries of rates of return5 using the formula:

Rt ¼Pt

Pt�1

� �� 1 ð1Þ

where Rt is the rate of return and Pt is the price level at time t.The summary statistics (see Fig. 2) of the EUR/USD returns

series reveal a slight negative skewness and high kurtosis. The Jar-que–Bera statistic confirms again that the EUR/USD return series isnon-normal at the 99% confidence level.

In the absence of any formal theory behind the selection ofthe inputs of a neural network and a genetic programming algo-rithm we conduct some neural networks and genetic program-ming experiments in the training dataset in order to help ourdecision. Based on these experiments we selected as inputs aset of autoregressive terms of the of the EUR/USD, EUR/GBPand EUR/JPY ECB fixing exchange rates returns which is pre-sented in Table 2.6

In order to train our neural networks we further divided ourdataset as in Table 3.

4. Forecasting models

4.1. Benchmark models

In this paper, we benchmark our Psi Sigma and Genetic Expres-sion models with MLP, RNN and genetic programming algorithms,an ARMA model and a naïve strategy.

5 Confirmation of its stationary property is obtained at the 1% significance level byboth the Augmented Dickey Fuller (ADF) and Phillips-Perron (PP) test statistics.

6 We also explored as inputs autoregressive terms of other exchange rates (e.g. theECB fixings of USD/JPY and GBP/JPY), commodities prices (e.g. Gold Bullion and BrentOil) and stock market prices (e.g. FTSE100 and NYMEX). However, the set of inputspresented in Table 2 gave our NNs and GP algorithms the highest trading performancein the training period and were thus retained.

0.80.9

11.11.21.31.41.51.61.7

3/9/

2001

3/3/

2002

3/9/

2002

3/3/

2003

3/9/

2003

3/3/

2004

3/9/

2004

3/3/

2005

3/9/

2005

3/3/

2006

3/9/

2006

3/3/

2007

3/9/

2007

3/3/

2008

3/9/

2008

3/3/

2009

3/9/

2009

3/3/

2010

3 September 2001 to 31 August 2010

EU

R/U

SD

Fig. 1. EUR/USD Frankfurt daily fixing prices (total dataset).

0

100

200

300

400

500

-0.0375 -0.0250 -0.0125 0.0000 0.0125 0.0250 0.0375

Series: RETURNSSample 1 2273Observations 2273

Mean 0.000164Median 0.000205Maximum 0.041203Minimum -0.046251Std. Dev. 0.006480Skewness -0.005166Kurtosis 6.328731

Jarque-Bera 1049.421Probability 0.000000

Fig. 2. EUR/USD returns summary statistics (total dataset).

Table 2Explanatory variables.

Number Variable Lag

1 EUR/USD exchange rate return 12 EUR/USD exchange rate return 23 EUR/USD exchange rate return 34 EUR/USD exchange rate return 45 EUR/USD exchange rate return 56 EUR/USD exchange rate return 97 EUR/GBP exchange rate return 18 EUR/GBP exchange rate return 29 EUR/JPY exchange rate return 1

10 EUR/JPY exchange rate return 211 EUR/JPY exchange rate return 312 EUR/JPY exchange rate return 4

G. Sermpinis et al. / Expert Systems with Applications 39 (2012) 8865–8877 8867

4.1.1. Naïve strategyThe naïve strategy simply takes the most recent period change

as the best prediction of the future change, i.e. a simple randomwalk. The model is defined by:bY tþ1 ¼ Yt ð2Þ

where Yt is the actual rate of return at period t is the bY tþ1 is the fore-cast rate of return for the next period

The performance of the strategy is evaluated in terms of tradingperformance via a simulated trading strategy.

4.1.2. ARMA modelAutoregressive moving average models (ARMA) assume that the

value of a time series depends on its previous values (the autore-gressive component) and on previous residual values (the movingaverage component).7

The ARMA model takes the form:

Yt ¼ /0 þ /1Yt�1 þ /2Yt�2 þ � � � þ /pYt�p þ et �w1et�1

�w2et�2 � . . .�wqet�q ð3Þ

where Yt is the return at time t, Yt�1, Yt�2, and Yt�p are the laggedreturns, /0, /1, /2, and /p are regression coefficients, et is the resid-ual term, et-1, et-2, and et-p are previous values of the residual and w1,w2, and is wq are the residual weights.

Using as a guide the correlogram in the training and the testsub-periods we have chosen a restricted ARMA (11,11) model.

7 For a full discussion on the procedure, refer to Box, Jenkins, and Gregory (1994) orPindyck and Rubinfeld (1998).

All of its coefficients are significant at the 95% confidence interval.The null hypothesis that all coefficients (except the constant) arenot significantly different from zero is rejected at the 95% confi-dence interval (see Appendix A.1).

The selected ARMA model takes the form:

Yt ¼ 0:000187þ 0:395388Yt�1 � 0:90320169Yt�4

þ 0:280795Yt�5 þ 0:229901Yt�11 � 0:392357et�1

þ 0:916159et�4 � 0:251859et�5 � 0:219821et�11 ð4Þ

The model selected was retained for out-of-sample estimation.The performance of the strategy is evaluated in terms of tradingperformance.

Table 3The neural networks datasets.

Name of period Tradingdays

Beginning End

Total dataset 2273 3 September, 2001 31 August, 2010Training data set 1521 3 September, 2001 31 August, 2007Test data set 372 3 September, 2007 27 February, 2009Out-of-sample data set

[Validation set]380 3 March, 2009 31 August, 2010


4.2. Neural networks

Neural networks exist in several forms in the literature. Themost popular architecture is the Multi-Layer Perceptron (MLP).

A standard neural network has at least three layers. The firstlayer is called the input layer (the number of its nodes correspondsto the number of explanatory variables). The last layer is called theoutput layer (the number of its nodes corresponds to the numberof response variables). An intermediary layer of nodes, the hiddenlayer, separates the input from the output layer. Its number ofnodes defines the amount of complexity the model is capable of fit-ting. In addition, the input and hidden layer contain an extra nodecalled the bias node. This node has a fixed value of one and has thesame function as the intercept in traditional regression models.Normally, each node of one layer has connections to all the othernodes of the next layer.

The network processes information as follows: the input nodescontain the value of the explanatory variables. Since each nodeconnection represents a weight factor, the information reaches asingle hidden layer node as the weighted sum of its inputs. Eachnode of the hidden layer passes the information through a non-lin-ear activation function and passes it on to the output layer if thecalculated value is above a threshold.

The training of the network (which is the adjustment of itsweights in the way that the network maps the input value of thetraining data to the corresponding output value) starts with ran-domly chosen weights and proceeds by applying a learning algo-rithm called backpropagation of errors8 (Shapiro, 2000). Thelearning algorithm simply tries to find those weights which mini-mize an error function (normally the sum of all squared differencesbetween target and actual values). Since networks with sufficienthidden nodes are able to learn the training data (as well as their out-liers and their noise) by heart, it is crucial to stop the training proce-dure at the right time to prevent overfitting (this is called ‘earlystopping’). This can be achieved by dividing the dataset into 3 sub-sets respectively called the training and test sets used for simulatingthe data currently available to fit and tune the model and the valida-tion set used for simulating future values. The network parametersare then estimated by fitting the training data using the above men-tioned iterative procedure (backpropagation of errors). The iterationlength is optimised by maximising the forecasting accuracy for thetest dataset. Then the predictive value of the model is evaluatedapplying it to the validation dataset (out-of-sample dataset).

Since the starting point for each network is a set of randomweights, forecasts can differ between networks. In order to elimi-nate any variance between our NN forecasts, we used the averageof a committee of 10 NNs which presented the highest profit in thetest sub-period. The characteristics of the NNs used inthis paperare presented in Appendix A.2.

4.2.1. The Multi-layer Perceptron modelThe network architecture of a ‘standard’ MLP looks as presented

in Fig. 3.9

In Fig. 3 x½n�t ðn ¼ 1;2; � � � ; kþ 1Þ are the model inputs (includingthe input bias node) at time t, h½m�t ðm ¼ 1;2; . . . ; jþ 1Þ are the hid-den nodes outputs (including the hidden bias node), ~yt is the

MLP model output, ujk and wj are the network weights, is

the transfer sigmoid function:

SðxÞ ¼ 11þ e�x

; ð5Þ

8 Backpropagation networks are the most common multi-layer networks and arethe most commonly used type in financial time series forecasting (Kaastra and Boyd,1996).

9 The bias nodes are not shown here for the sake of simplicity.

is a linear function:

FðxÞ ¼X

i

xi ð6Þ

The error function to be minimised is:

Eðujk;wjÞ ¼1T

XT

t¼1

ðyt

� ~ytðujk;wjÞÞ2; with yt being the target value ð7Þ

4.2.2. The recurrent networkThe next model is the recurrent neural network. While a com-

plete explanation of RNN models is beyond the scope of this paper,we present below a brief explanation of the significant differencesbetween RNN and MLP architectures. For an exact specification ofrecurrent networks, see Elman (1990).

A simple recurrent network has an activation feedback whichembodies short-term memory. The advantages of using recurrentnetworks over feedforward networks, for modelling non-lineartime series, has been well documented in the past. However asmentioned by Tenti (1996), ‘‘the main disadvantage of RNNs isthat they require substantially more connections, and morememory in simulation than standard backpropagation networks’’(p. 569), thus resulting in a substantial increase in computationaltime. However, having said this, RNNs can yield better results incomparison with simple MLPs due to the additional memoryinputs.

A simple illustration of the architecture of an Elman RNN is pre-sented in Fig. 4.

In Fig. 4 x½n�t ðn ¼ 1;2; � � � ; kþ 1Þ;u½1�t ;u½2�t are the model inputs

(including the input bias node) at time t, ~yt is the recurrent model

output, d½f �t ðf ¼ 1;2Þ and w½n�t ðn ¼ 1;2; � � � ; kþ 1Þ are the network

weights, U½f �t ðf ¼ 1;2Þ is the output of the hidden nodes at time t,

is the transfer sigmoid function:

SðxÞ ¼ 11þ e�x

ð8Þ

is the linear output function:

FðxÞ ¼X

i

xi ð9Þ


Eðdt ;wtÞ ¼1T

XT

t¼1

ðyt � ~ytðdt;wtÞÞ2 ð10Þ

In short, the RNN architecture can provide more accurate out-puts because the inputs are (potentially) taken from all previous

values (see inputs U½1�j�1 and U½2�j�1 in the figure above).

4.2.3. The Psi Sigma NetworkPsi Sigma networks can be considered as a class of feedfoward

fully connected higher order neural networks. First introduced by

ty~

]2[jU

]1[jU

]1[jx

]2[jx

]3[jx

]1[1−jU

]2[1−jU

Fig. 4. Elman RNN architecture with two nodes on the hidden layer.

MLP

][ktx

][ jth

jkujw

ty~

Fig. 3. A single output, fully connected MLP model.


Shin and Ghosh (1991), the Psi Sigma network utilises product cellsas the output units to indirectly incorporate the capabilities ofhigher-order networks while using a fewer number of weightsand processing units. Their creation was motivated by the needto create a network combining the fast learning property of singlelayer networks with the powerful mapping capability of higher or-der neural networks while avoiding the combinatorial increase inthe required number of weights. The order of the network in thecontext of Psi Sigma is represented by the number of hidden nodes.

In a Psi Sigma network the weights from the hidden to the out-put layer are fixed to 1 and only the weights from the input to thehidden layer are adjusted, something that greatly reduces thetraining time. Moreover, the activation function of the nodes inthe hidden layer is the summing function while the activationfunction of the output layer is a sigmoid. Fig. 5 shows a Psi Sigmawith one output layer.

1x 2x jx Nx

Output Layer (sigmoid)

Fixed weights equal to 1

Hidden Layer (linear)

Adjustable weights

Input Layer

Fig. 5. A Psi Sigma network with one output layer.

In Fig. 5 xt(n = 1, 2, . . ., k + 1) are the model inputs (including theinput bias node), ~yt is the Psi Sigma output, wj is the adjustableweights

hðxÞ ¼ rixi ð11Þ

is the hidden layer activation function

rðxÞ ¼ 11þ e�xc

ð12Þ

is the output unit adaptive sigmoid activation function with c theadjustable term


Eðc;wjÞ ¼1T

XT

t¼1

ðyt � ~ytðwk; cÞÞ2 with yt being the target value

ð13Þ

For example let us consider a Psi Sigma network which is fed with aN + 1 dimensional input vector x = (1, x1, ..., xN)T .These inputs areweighted by K weight factors wj = (w0j, w1j, ..., wNj)T, j = 1, 2, . . ., Kand summed by a layer of K summing units, where K is the desiredorder of the network. So the output of the j-th summing unit, hj in

the hidden layer, is given by: hj ¼ wTj x ¼

PNk¼1wkjxk þwoj;

j ¼ 1;2; . . . ;K while the output ~y of the network is given by~y ¼ rð

QKj¼1hjÞ (in our case we selected as r the sigmoid function

Eq. (12)). Note that by using products in the output layer we di-rectly incorporate the capabilities of higher order networks with asmaller number of weights and processing units. For example, ak-th degree higher order neural network with d inputs needsPk

i¼0ðdþi�1Þ!i!ðdþ1Þ! weights if all products of up to k components are to be

incorporated while a similar Psi Sigma network needs only(d + 1) � k weights. Also note that the sigmoid function is neuronadaptive. As the network is trained not only the weights but alsoc in (12) is adjusted. This strategy seems to provide better fittingproperties and increases the approximation capability of a neuralnetwork by introducing an extra variable in the estimation, com-pared to classical architectures with sigmoidal neurons (Vecci,Piazza, and Uncini (1998)).

The price for the flexibility and speed of Psi Sigma networks isthat they are not universal approximators. We need to choose asuitable order of approximation (or else the number of hiddenunits) by considering the estimated function complexity, amountof data and amount of noise present. To overcome this, our coderuns simulations for orders two to six and then it presents the bestnetwork.

4.3. The genetic programming algorithm

For the purpose of our research, the GP application is coded andimplemented to evolve tree based structures that present models(sub-trees) of input–output (see Fig. 6). In the design phase ofour GP application we focus primarily on execution time optimiza-tion as well as limiting the ‘bloat effect’. The bloat effect is similarto the issue of overfitting experienced in neural networks howeverin our case we run the risk of continuously increasing and expand-ing the tree size. This algorithm is run in a ‘steady state’ in that asingle member of the population is replaced at a time. Further-more, our GP application reproduces newer models replacing theweaker ones in the population according to their fitness. The rea-soning behind the decision to use a steady state algorithm is justi-fied as they hold a greater selection strength and genetic drift overother algorithms such as typical generational GA. Additionallysteady state algorithms also offer exceptional multiprocessingcapabilities (Ferreira (2006)).

In our application of the genetic programming we utilizeformulas to evolve algebraic expressions that enable the analysis

Fig. 6. Example of a tree structure.


/ optimization of results in a ‘tree like structure’. This genetic treestructure consists of nodes (depicted as circles in the diagram be-low) which are essentially functions that perform actions withinthis structure. Furthermore, these functions are in place to gener-ate output signals. On the other hand, the squares in the tree sig-nify terminal functions representing the end of a function oncethe most superior sub-tree (model) is achieved. For example, thetree structure below is characterised by the algebraic expression4.0/x1 (t � 1) + ln(x2(t�2)). In this example there are 3 terminalnodes which are expressed by x1(t � 1), x2(t � 2) and 4. The nonterminal nodes represent the functions, in our case /, ln and +(see Fig. 6). In the execution of the genetic algorithm it has to beunderstood that each individual in the population corresponds toa single sub-tree structure. Each of these sub-trees is limited bythe predefined maximum tree size set to 6 in our application.

Koza (1998) summarises the functionality aspect of the GP algo-rithm in the following steps:

(1) The generation of an initial population of randomly con-structed models is developed with each model being repre-sented in a tree like structure as discussed previously.Additionally, the evolutionary algorithm represents eachchromosome of the population as a tree of variable length(i.e. total number of functions and terminals) or a maximumdepth of the model tree. The process of randomly reproduc-ing each variable of the population is completed once all ofthese functions of the tree are terminal symbols. However,until the process is halted by these ‘terminal symbols’ thenthe tree like structure of chromosomes continues to multiply(grow) with each generation as the population expands tonot only include the parents but also their offspring. Thisis achieved by crossover and mutation operators. On thewhole, it also has to be understood that the majority of thesemodels produced in the initial population are, in most cases,unsatisfactory when tested for their performance with someindividual models ‘fitting’ better than others. However, oneof the virtues offered by genetic programming is that theyexploit and manipulate these differences until the best fit-ting models, in terms of least error, are produced.

(2) Following this initial generation of randomly selected mod-els a random subset (sub-tree) of the population is thenselected for a tournament. Hence this process is known asa tournament selection phase. This process (tournamentprocedure) is essentially a selection mechanism to decipherwhich individuals from the population are to be selected forreproduction to develop the next generation.

(3) An evaluation of the members of this subset is then carriedout and assigned a fitness value. As stated by Koza (1998)the fitness cases are either selected at random or in somestructured manner (e.g. at regular intervals). In our applica-tion, as mentioned briefly in the first step, the fitness value isdefined as the mean squared error (MSE) with the lowest

MSE being targeted as the best. Furthermore, the fitnessmay be measured in terms of the sum of the absolute valueof the differences between the output produced by themodel and/or the desired output (i.e. the Minkowski dis-tance) or, alternatively, the square root of the sum of thesquared errors (i.e. the Euclidean distance).

(4) Following the establishment of fitness values the tourna-ment winners are then determined.To reiterate, the winnersof this scenario are the models with the lower MSE.

(5) Having identified the tournament winners in the previousstep we then proceed by exposing the models to two geneticoperators known as mutations and crossovers. Both opera-tors are discussed in more detail below:

4.3.1. MutationThis is the creation of a new model that is mutated randomly

from an existing one as circled in Fig. 7 below (1�). This one muta-tion point is indiscriminately chosen as an independent point andthe resulting sub-tree is to be omitted. From this resulting sub-tree, another new sub-tree (2�) is then reproduced using the sameprocedure that was initially implemented to create the originalrandom population. Although this was the procedure we imple-mented for mutation there are also a number of alternative meth-ods that are explored in other research.

4.3.2. CrossoverThis operator creates two new models from existing models by

genetically recombining randomly chosen parts of them. This isachieved by using the crossover operation applied at a randomlychosen crossover point within each model. Due to the fact that en-tire sub-trees are swapped (from point 1� to point 2� and frompoints 3� to 4�), the crossover operation produces models as off-springs (see Fig. 8). Furthermore, the models are selected basedon their fitness and the crossover allocates future trials to regionsof the search space whose models contain parts from superiormodels. As a full explanation of crossovers is beyond the scope ofthis paper please refer to Koza (1992) for more details.

(6) The population is then altered with the tournament losersbeing replaced by the winners (superior) offspring.

(7) Provided the termination criterion is not reached, the algo-rithm returns to step 2 and these steps are repeated untilthe predefined termination criterion for genetic program-ming is satisfied. In our study we have set the terminationcriterion to 100,000 at which point the cycles are stoppedand forecasted results can be obtained.

(8) Ultimately, this protocol produces the best individual(model) of the population as a result.

Since the generation of the initial population is randomly con-structed, forecasts can differ between GP algorithms. In order toeliminate any variance between our GP forecasts, we used theaverage of a committee of 10 GP algorithms which presented thehighest profit in the training sub-period. The characteristics ofthe GPs used in this paper are presented in Appendix A.2 whileFig. 9 presents a summary of the steps described above.

4.4. The Gene Expression algorithm

As mentioned before the models in GEP are symbolic strings offixed length representing an organism’s genome (chromosome/genotype), but these simple entities are encoded as non-linearentities of different sizes and shapes determining an organism’s fit-ness (expression trees/phenotype). GEP chromosomes are made upof multiple genes spanning equal lengths across the structure ofthe chromosome. Each gene is comprised of a head (detailing

Fig. 7. Mutation tree structure example.

Fig. 8. Crossover family tree like structure example.


symbols specific to functions and terminals, i.e. +, �, AND, NOR)and a tail (only includes terminals). For a mathematical represen-tation please refer to the equation below:

t ¼ ðn� 1Þhþ 1 ð14Þ

where h is the head length of the gene, t the tail length of the gene,and n is the total number of arguments within the function.10

As an inference the set of terminals included within both theheads and tails of the chromosomes contain constants as well ascase specific variables. In addition, regardless of the fact that eachof the genes is equal and fixed in size they hold the capacity to codefor multiple and varied expression trees (ET). For example, thestructure of GEP is able to cope in circumstances when the first ele-ment of a gene is terminal producing a single node as well as whenmultiple nodes (‘sub-trees’ reproduced by functions) are producedin search for eventual terminality. In contrast with its predeces-sors, GEP does not require the rejecting of invalid individuals fromthe population, as valid ETs are always generated. Thus, each gene

10 This is determined by the user. In most cases a function will either be a Booleanfunction or any mathematical function that is suited to a specific problem.

encodes an ET and in situations where multiple generations arise,the GEP is creating sub ETs with interlinking functions to enablereproduction of future generations. Furthermore, the expressionof each ET is enabled by an Open Reading Frame (ORF) which as-sists in the decoding process. Additionally, while the ORF is initi-ated at the beginning of each gene it has to be understood thatthe eventual terminal points are not always determined to be lo-cated at the end of the gene (for more details see Ferreira (2006)).

Although it is crucial to understand the workings of a GEP it isalso just as important to understand its step by step optimizationprotocol. This is depicted in Fig. 10.

The different steps of the algorithm from the above diagram areexplained in more detail as follows:

4.4.1. Creation of initial populationSimilar to other evolutionary algorithms, GEP randomly gener-

ates an initial population from populations of individuals and allsucceeding populations are spawned from this initial population.In the spawning of new generations genetic operators evolve eachof the individuals by ‘mating’ them with other individuals in thepopulation. These genetic operators are deciphered by the natureof the problem which one wants to solve. Genetic operators mayinclude (but are not limited to) ‘+’, ‘�’, ‘�’ and ‘/’ symbols formathematical models and ‘And’, ‘Or’, ‘Nand’, ‘Nor’, ‘Xor’, ‘Nxor’,‘<’, ‘>’, ‘< or =’, and ‘ > or =’ for logical expressions as explained byFerreira (2001). Therefore, the terminals and functions (symbols)may vary from problem to problem. Other intricacies such as genesize also have to be specified by the user at this stage.

4.4.2. Express chromosomesIn this step we progress by developing expression trees (ET)

from our chromosomes. The structure of each ET is in such a waythat the root or the first node corresponds with beginning of eachgene. The resulting offspring evolved from the first node is depen-dent on the number of arguments. In this process of evolution thefunctions may have numerous arguments however the terminalstake 0 arguments. Each of the resulting offspring’s characteristicsis populated in nodes ordered from left to right. This process is con-cluded once terminal nodes are established.

4.4.3. Evaluate fitnessIn order to create an accurate model suited to our forecasting

requirements it is imperative that a function which minimizes er-ror and improves accuracy is used. Therefore, in order to evolve ourinitial population in line with our target market we must clearlydefine the goal of our model. Ultimately, this ‘fitness function’determines the optimality of our solution. In our application, asmentioned before in the GP algorithm, the fitness value is definedas the mean squared error (MSE) over the training period with thelowest MSE being targeted as the best. However, on the odd occa-sion some of the individuals that are generated randomly to createour initial population provide suitable solutions and hence arriveat terminal functions. More often than not though, individuals forthe initial population provide poor candidates for the purpose ofthe investigation and require further evolution to achieve terminalvalues.

4.4.4. Keep best programIn our GEP model the main principal during the process of evo-

lution is the generation of offspring from two superior individualsto achieve ‘elitism’. As a consequence the best individuals from theparent generation produce offsprings in future generations withthe most desirable traits whilst the individuals with less desirabletraits are removed. On this basis our model minimizes error andmaintains superior forecasting abilities. As explained in greater de-tail by Ferreira (2001), elitism is the cloning of the best

The generation of an initial population

Evaluation

Selection

Reproduction

New Generation

Termination

End

Crossover Mutation

Yes

No

Fig. 9. The architecture of the genetic programming algorithm.


chromosome(s)/individual(s) to the next population (also calledgeneration). Furthermore, the role of ‘elitism’ (via suited geneticoperators) enables the selection of fitter individuals without elim-inating the entire population.

4.4.5. SelectionThe selection of individuals based on their ‘fitness’ is carried out

during the ‘tournament’ selection for reproduction and modifica-tion. This process selects the individuals at random with the supe-rior ones being chosen for genetic modification in order to createnew generations. The intensity of competition is dictated by thetournament size which is adjusted and set by the practitioner.The greater the tournament size the more competitive the selec-tion process and therefore weaker individuals are less likely tocompete.

4.4.6. ReproductionIn the reproduction of future generations we have to consider

the types of genetic operators which make this ‘evolution’ possible.Specifically we apply the genetic operators known as mutation andrecombination as explained below.

4.4.6.1. Mutation. This is the creation of a new model that is mu-tated randomly from an existing one as presented in the figure be-low. Firstly a parent is randomly selected with a probability relatedto its fitness. Then the mutation point on the parent’s chromosomeis indiscriminately chosen as an independent point. Afterwards themutation randomly changes one or more genes representing partof the solution it encodes and the new mutated individual is addedto the population. This procedure is described in detail in Ferreira(2006).

4.4.6.2. Recombination. In contrast to our mutation operator thisprocess is not executed at random. Instead the parent chromo-somes are matched and split up or ‘spliced’ at identical points in

order to determine recombination points. The subsequent splicedparts of each of the genes are then exchanged between the two se-lected chromosomes on the basis of probability. This results intotwo new individuals as a result of genetic engineering. Note thatduring reproduction it is the chromosomes of the individuals, notthe expression trees that are reproduced with modification andtransmitted to the next generation (see Fig. 11).

4.4.7. Prepare new programs of the next generationAt this step, we replace the tournament losers with the new

individuals created by reproduction in the population.

4.4.8. Termination criterionWe check if the termination criterion is fulfilled, if it is not we

return to step 2. As a termination criterion we used a maximumnumber of 100,000 generations during which the GEP was left torun.

4.4.9. ResultsAs a result we return the best individual ever found during the

evolution process.Since the generation of the initial population is randomly con-

structed, forecasts can differ between GEP algorithms. In order toeliminate any variance between our GEP forecasts, we used theaverage of a committee of 10 GEP algorithms which presentedthe highest profit in the training sub-period. The characteristicsof these GEPs are presented in Appendix A.2.

5. Empirical results

5.1. Trading and statistical performance

The in-sample statistical performance of our models is pre-sented in Table 4 below while Appendix A.3 describes the statisti-cal measures. For all four of the error statistics retained (RMSE,

Fig. 10. GEP Optimisation Protocol.

Fig. 11. Chromosome recombination structure example.

11 The training time for a PSI Sigma network with a Fujitsu Celsius W280 and thecharacteristics presented in Appendix A.2. was 5 min.

12 We also explored a GJR (1,1) model in forecasting volatility. Its statistical accuracyin the test sub-period in terms of the MAE, MAPE, RMSE and the Theil-U statistics isonly slightly better compared with RiskMetrics. However, when we measure theutility of GJR in terms of trading efficiency for our models within the context of ourstrategy in the test sub-period, our results in terms of annualised returns is slightlybetter with RiskMetrics for most of our models. Moreover, RiskMetrics is simpler toimplement than the more complicated GJR. Therefore, we choose to present in thispaper the results obtained with RiskMetrics. The results obtained with GJR, which arevery close to the ones presented here, are available upon request. It is also worthnoting that the ranking of our models in terms on information ratio and annualisedreturn is the same whether we use GJR or RiskMetrics.


MAE, MAPE and Theil-U) the lower the output, the better the fore-casting accuracy of the model concerned.

We note that GEP presents the best in-sample statistical perfor-mance with PSI Sigma having the second most accurate forecasts inthe in-sample period. GP, MLP and RNN have the second, third andfourth more accurate forecasts respectively.

In Table 5 we present the trading performance of our modelsin the training and test sub-periods while Appendix A.4 docu-ments the trading performance measures. The trading strategyapplied is simple: go or stay long when the forecast return isabove zero and go or stay short when the forecast return is be-low zero.

We note that GEP and the PSI network do equally well andoutperform all other models in terms of information ratio andannualised return. The genetic programming algorithm presentsthe third best performance while the MLP has the fourth higherinformation ratio and annualised return. Generally, all our mod-els except the naive present a satisfactory trading performance

with positive annualised return and reasonable maximumdrawdown.

A summary of the trading empirical results in the out-of-sam-ple period is presented in Table 6 below while Appendix A.5 in-cludes the statistical performance of our models for the sameperiod.

We can see that the GEP outperforms all other models interms of trading performance. It presents a slightly higher annu-alised return and information ratio than the PSI network and al-most 50% higher performance than GP, our third best model.Although due to the ability of GEPs to distinguish the differencesbetween the genotype and the phenotype we were expectingthem to outperform GP, we observe that they also clearly outper-form the traditional MLPs and RNNs. On the other hand, theRNNs present a rather disappointing performance compared totheir benchmarks despite their additional memory inputs. More-over, it is worth noting that the major theoretical advantage ofPsi Sigma networks, namely their speed, was clearly confirmedas we achieved our results in approximately half of the timeneeded to train the GPs and MLPs and one tenth of the timeneeded for the RNNs and GEPs.11

5.2. Transaction costs

Up to now, we have presented the trading results of all ourmodels without considering transaction costs. Since some of ourmodels trade quite often, taking transaction costs into accountmight change the whole picture.

The transaction costs for a tradable amount, say USD 5–10 mil-lion, are about 1 pip (0.0001 EUR/USD) per trade (one way) be-tween market makers. But since we consider the EUR/USD timeseries as a series of middle rates, the transaction costs is one spreadper round trip.With an average exchange rate of EUR/USD of 1.369for the out-of-sample period, a cost of 1 pip is equivalent to anaverage cost of 0.007% per position.

In Table 7 below we present the out-of-sample performance ofour models after transaction costs.

From the table above we can see that, after transaction costs,GEP continues to outperform all the other strategies in terms ofannualised return.

5.3. Leverage to exploit high information ratios

In order to further improve the trading performance of ourmodels we introduce a leverage based on RiskMetrics one dayahead volatility forecasts.12 The intuition of the strategy is to avoidtrading when volatility is very high while at the same time exploit-ing days when the volatility is relatively low. As mentioned by Bert-olini (2010), there are few papers on market-timing techniques forforeign exchange, with the notable exception of Dunis and Miao(2005, 2006). The opposition between market-timing techniquesand time-varying leverage is only apparent as time-varying leveragecan also be easily achieved by scaling position sizes inversely torecent risk measures behaviour.

Table 4In-sample statistical performance.

NAIVE ARMA MLP RNN PSI GP GEP

MAE 0.0066 0.0042 0.0046 0.0033 0.029 0.0036 0.0035MAPE (%) 90.13 48.57 35.21 39.18 14.94 18.51 13.91RMSE 0.0089 0.0056 0.0063 0.00285 0.00329 0.00352 0.00164Theil-U 0.2014 0.4029 0.3457 0.3361 0.2442 0.2665 0.2263

Table 5In-sample trading performance.


Information ratio (excluding costs) 0.19 1.49 2.02 1.36 4.24 2.52 4.30Annualised volatility (excluding costs) (%) 9.97 9.91 9.95 9.99 9.55 9.60 9.68Annualised return (excluding costs) (%) 1.93 14.78 20.07 13.55 40.49 24.17 41.58Maximum drawdown (excluding costs) (%) �27.33 �6.97 �14.49 �18.05 �9.81 �10.57 �14.00Positions taken (annualised) 128 123 82 96 121 127 99

Table 6Validation trading performance results.


Information ratio (excluding costs) �0.63 0.11 0.90 1.18 1.99 1.62 2.16Annualised volatility (excluding costs) (%) 11.49 11.52 11.50 11.48 11.22 11.05 11.41Annualised return (excluding costs) (%) �7.27 1.27 10.39 13.50 22.29 17.89 24.63Maximum drawdown (excluding costs) (%) �23.15 �15.72 �12.86 �10.89 �7.80 �16.78 �17.27Positions taken (annualised) 130 93 142 140 152 131 105

Table 7Validation trading performance results after transaction costs.


Information ratio (excluding costs) �0.63 0.11 0.90 1.18 1.99 1.62 2.16Annualised volatility (excluding costs) (%) 11.49 11.52 11.50 11.48 11.22 11.05 11.41Annualised return (excluding costs) (%) �7.27 1.27 10.39 13.50 22.29 17.89 24.63Maximum drawdown (excluding costs) (%) �23.15 �15.72 �12.86 �10.89 �7.80 �16.78 �17.27Positions taken (annualised) 130 93 142 140 152 131 105Transaction costs (%) 0.91 0.65 0.99 0.98 1.06 0.92 0.74Annualised return (including costs) (%) �8.18 0.62 9.39 12.52 21.23 16.97 23.90

13 The interest costs are calculated by considering a 1.75% interest rate p.a. dividedby 252 trading days. In reality, leverage costs also apply during non-trading days sothat we should calculate the interest costs using 360 days per year. But for the sake ofsimplicity, we use the approximation of 252 trading days to spread the leverage costsof non-trading days equally over the trading days. This approximation prevents usfrom keeping track of how many non-trading days we hold a position.


Firstly, we forecast with RiskMetrics the one day ahead realisedvolatility of the EUR/USD exchange rate in the test and validationsub-periods. Then, following Dunis and Miao (2005, 2006) we splitthese two periods into six sub-periods, ranging from periods withextremely low volatility to periods experiencing extremely highvolatility. Periods with different volatility levels are classified inthe following way: first the average (l) difference between the ac-tual volatility in day t and the forecasted for day t + 1 and its ‘vol-atility’ (measured in terms of standard deviation r) are calculated;those periods where the difference is between the l plus one r areclassified as ‘Lower High Vol. Periods’. Similarly, ‘Medium HighVol.’ (between l + r and l + 2r) and ‘Extremely High Vol.’ (abovel + 2r) periods can be defined. Periods with low volatility are alsodefined following the same 1r and2r approach, but with a minussign.

For each sub-period a leverage is assigned starting with 0 forperiods of extremely high volatility to a leverage of 2.5 for periodsof extremely low volatility. Table 8 below presents the sub-periodsand their relevant leverages.

The parameters of our strategy (l and r) are updated everythree months by rolling forward the estimation period. So forexample, for the first three months of our validation period, land r are computed based on the eighteen months of the test

sub-period. For the following three months, the two parametersare computed based on the last fifteen months of our test sub-per-iod and the first three of the validation sub-period.

The cost of leverage (interest payments for the additional capi-tal) is calculated at1.75% p.a. (that is 0.0069% per trading day13).Our final results are presented in Table 9.

As can be see from the last row of Table 9, all models except theARMA strategy manage to exploit the time-varying leverage andsubstantially increase their trading performance in terms of annu-alised return. The PSI network achieves the highest information ra-tio and annualised return net of costs among the other sixcompeting models, very close to the one achieved by GEP. TheGP, the RNN and the MLP present the third, fourth and fifth highesttrading performance respectively. It is worth noting that the naïvestrategy which had a negative performance before the introduction

Table 12Statistical performance measures.

Performance measure Description

Mean absolute errorMAE ¼ 1

n

� � Xtþn

s¼tþ1

jr̂s � rsj ð15Þ

with rs being the actual volatilityand r̂s the forecasted value

Mean absolute percentage errorMAPE ¼ 1

n

Xtþn

s¼tþ1

jrs � r̂s

rsj ð16Þ

Root mean squared error

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Xtþn

s¼tþ1

ðr̂s � rsÞ2vuut ð17Þ

Theil-U

Theil-U ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Ptþn

s¼tþ1ðr̂s � rsÞ2

� �sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Ptþn

s¼tþ1r̂2

s

sþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Ptþn

s¼tþ1r2

s

s ð18Þ

Table 13Trading simulation performance measures.

Performance measure Description

Annualised returnRA ¼ 252 � 1

N

XN

t¼1

Rt ð19Þ

with Rt being the daily return

Table 10Network characteristics.

Parameters MLP Reccurent Psi Sigma

Learning algorithm Gradientdescent

Gradientdescent

Gradientdescent

Learning rate 0.001 0.001 0.5Momentum 0.003 0.003 0.5Iteration steps 10,000 10,000 10,000Initialisation of

weightsN(0,1) N(0,1) N(0,1)

Input nodes 12 12 12Hidden nodes (1

layer)7 5 6

Output node 1 1 1

Table 8Sub-periods and leverages.

Extremely low vol. Medium low vol. Lower low vol. Lower high vol. Medium high vol. Extremely high vol.

Leverage 2.5 2 1.5 1 0.5 0

Table 9Validation trading performance – final results.a


Information ratio (excluding costs) 0.39 �0.17 1.03 1.20 2.03 1.21 1.89Annualised volatility (excluding costs) (%) 18.65 22.63 20.12 18.31 20.11 20.13 20.51Annualised return (excluding costs) (%) 7.23 �3.92 20.78 21.88 40.75 24.45 38.67Maximum drawdown (excluding costs) (%) �16.57 �22.64 �16.33 �15.19 �13.31 �18.04 �22.24Positions taken (annualised) 128 91 140 138 150 129 103Transaction and leverage costs (annualised) (%) 2.68 2.43 2.78 2.76 2.84 2.71 2.51Average leverage factor (ex post)b n.a. n.a. 1.92 n.a. 1.79 1.28 1.51Annualised return (including costs) (%) 4.56 �6.35 18.00 19.13 37.91 21.75 36.15

a Not taken into account the interest that could be earned during times where the capital is not traded (non-trading days) or not fully investedand could therefore beinvested.

b The average leverage factor ex post is computed as the ratio of the annualised returns after costs of Tables 9 and 7 for those models which achieved an in-sampleinformation ratio of at least 2 and, as such, would have been candidates for leveraging out-of-sample.

Table 11Genetic algorithms characteristics.

Parameters GP GEP

Population size 1000 1000Tournament size 20 20Mutation probability 0.75 0.75Maximum generations 100,000 100,000


of the leverage, now presents a positive annualised return of 4.56%net of costs.

Cumulative returnRC ¼

XN

t¼1

Rt ð20Þ

Annualised volatility

rA ¼ffiffiffiffiffiffiffiffiffi252p

�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

N � 1�XN

t¼1

ðRt � �RÞ2vuut ð21Þ

Information ratioSR ¼ RA

rAð22Þ

Maximum drawdown Maximum negative value ofPðRtÞ over the period

MD ¼ Mini¼1;...;t;t¼1;...;N

Xt

j¼i

Rj

!ð23Þ

6. Concluding remarks

In this paper, we apply two promising classes of artificial intel-ligence models, the Psi Sigma neural network (PSI) and the GeneExpression algorithm (GEP) to a one-day-ahead forecasting andtrading task of the Euro/Dollar (EUR/USD) exchange rate usingthe European Central Bank (ECB) fixing series. Their trading perfor-mance is benchmarked with those of two more standard neuralnetwork designs, the Multi-Layer Percepton (MLP) and the Recur-rent Neural Network (RNN), with a genetic programming algo-rithm and a traditional statistical technique,the autoregressive

Table 14Out-of-sample statistical performance.


MAE 0.0080 0.0066 0.0064 0.0062 0.0053 0.0058 0.0054MAPE (%) 156.52 64.16 56.60 62.59 43.04 49.22 45.22RMSE 0.0103 0.073494 0.0726 0.058036 0.03768 0.04468 0.042129Theil-U 0.7129 0.869891 0.5741 0.5943 0.5220 0.5643 0.5473


moving average model (ARMA), plus a naïve strategy. We developthese different prediction models over the period from September2001 to February 2009 and validate their out-of-sample tradingefficiency over the following period from March 2009 to August2010. We also examine if the introduction of a time-varying lever-age strategy based on RiskMetrics volatility forecasts can improvethe trading performance of our models. We use the EUR/USD dailyfixing by the ECB as many financial institutions are ready to tradeat this level and it is therefore possible to leave orders with a bankto transact business on that basis.

In our trading simulation, the Psi Sigma and the Gene Expres-sion do equally well and outperform all other models in terms ofannualised returns and information ratio before and after theapplication of the trading strategy. It seems that the ability of PsiSgima to capture higher order correlations within our datasetand of Genetic Expression to distinguish the differences between

the genotype and the phenotype, allow both approaches to excelin our forecasting competition compared with the more ‘tradi-tional’ MLP, RNN models and genetic programming algorithms.We also observe that after the application of the time-varyingleverage, all our models except ARMA demonstrate a substantialincrease in their trading performance in terms of annualisedreturn.

It is also important to note that the PSI network which presentsthe equally best trading performance with the GEP, needs consid-erably less training time compared to its neural and genetic pro-gramming benchmarks, a much desirable feature in a real-lifequantitative investment and trading environment. Our resultsshould go some way towards convincing a growing number ofquantitative fund managers to experiment beyond the bounds ofthe more traditional models and trading strategies.

Appendix A

A.1. ARMA model

The output of the ARMA model used in this paper is presented below.

Dependent Variable: RETURNS
Method: Least Squares Date: 11/06/10 Time: 17:40 Sample (adjusted): 12 1892 Included observations: 1881 after adjustments Convergence achieved after 18 iterations Backcast: 1 11 Variable Coefficient Std. Error t-Statistic Prob. C 0.000187 0.000152 1.227198 0.2199 AR(1) 0.395388 0.059344 6.662621 0.0000 AR(4) �0.903201 0.022091 �40.88631 0.0000 AR(5) 0.280795 0.049537 5.668407 0.0000 AR(11) 0.229901 0.012166 18.89715 0.0000 MA(1) �0.392357 0.058916 �6.659573 0.0000 MA(4) 0.916159 0.019401 47.22297 0.0000 MA(5) �0.251859 0.053382 �4.718030 0.0000 MA(11) �0.219821 0.013029 �16.87201 0.0000 R-squared 0.018014 Mean dependent var 0.000185 Adjusted R-squared 0.013818 S.D. dependent var 0.006289 S.E. of regression 0.006245 Akaike info criterion �7.309206 Sum squared resid 0.073015 Schwarz criterion �7.282701 Log likelihood 6883.309 F-statistic 4.292644 Durbin–Watson stat 1.979275 Prob(F-statistic) 0.000039 Inverted AR Roots .83 .77 + .61i .77 � .61i .51 + .72i
.51 � .72i
�.11 + .77i �.11 � .77i �.68 + .73i �.68 � .73i �.71 + .31i �.71 � .31i
Inverted MA Roots
.82 .78 � .62i .78 + .62i .51 + .71i .51 � .71i �.11 � .76i �.11 + .76i �.68 � .73i �.68 + .73i �.71 + .30i �.71 � .30i


A.2. Networks characteristics

In Table 10, we present the characteristics of the neural net-works with the best trading performance in the test sub-periodand which we used in our committees while Table 11 includesthe characteristics of the best genetic algorithms in the in-sampleperiod.

A.3. Statistical measures

The statistical measures are calculated as in Table 12.

A.4. Trading performance measures

The trading performance measures are calculated as in Table 13.

A.5. Out-of-sample statistical performance

Table 14 presents the out-of-sample statistical performance ofour models.

References

Bera, A., & Jarque, C. (1981). An Efficient Large-Sample Test for NormalityofObservations and Regression Residuals, Australian national universityworkingpaper in economics and econometrics, 40.

Bertolini, L. (2010). Trading foreign exchange carry portfolios, PhD Thesis, CassBusiness School, City University London.

Box, G., Jenkins, G., & Gregory, G. (1994). Time series analysis: forecasting and control.New Jersey: Prentice-Hall.

Chen, S. (2002). Genetic algorithms and genetic programming in computational finance.Amsterdam: Kluwer Academic Publishers.

Dehuri, S., & Cho, S. B. (2008). Classification rule mining using gene expressionprogramming. Third International Conference on Convergence and HybridInformation, 2, 754–760.

Dunis, C., Laws, J., & Sermpinis, G. (2011). Higher Order and Recurrent NeuralArchitectures for Trading the EUR/USD Exchange Rate. Quantitative Finance,11(4), 615–629.

Dunis, C. L., & Miao, J. (2005). Optimal trading frequency for active assetmanagement: evidence from technical trading rules. Journal of AssetManagement, 5/5, 305–326.

Dunis, C. L., & Miao, J. (2006). Advanced frequency and time domain filters forcurrency portfolio management. Journal of Asset Management, 7/1, 22–30.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.Ferreira, C. (2001). Gene expression programming: a new adaptive algorithm for

solving problems. Complex Systems, 13, 87–129.Ferreira, C. (2006). Gene expression programming: mathematical modelling by an

artificial intelligence. San Francisco: Springer.Ghazali, R., Hussain, A., Merabti, M. (2006). Higher order neural networks for

financial time series prediction. In The10th IASTED International Conference onArtificial Intelligence and Soft Computing, 119-124, Palma de Mallorca, Spain.

Ghosh, J., & Shin, Y. (1992). Efficient higher-order neural networks for classificationand function approximation. International Journal of Neural Systems, 3(4),323–350.

Hussain, A., Ghazali, R., Al-Jumeily, D. (2006). Dynamic ridge polynomial neuralnetwork for financial time series prediction. In IEEE international conference oninnovation in information technology, IIT06, Dubai.

Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting financialand economic time series. Neurocomputing, 10, 215–236.

Koza, J. R. (1992). Genetic programming: on the programming of computers by meansof natural selection. Cambridge, Mass: MIT Press.

Koza, J. R. (1998). Genetic programming. In J. G. Williams & A. Kent (Eds.),Encyclopedia of computer science and technology (pp. 29–43). New York: Marcel-Dekker.

Lisboa, P., & Vellido, A. (2000). Business applications of neural networks. In P. Lisboa,B. Edisbury, & A. Vellido (Eds.), Business applications of neural networks: the state-of-the-art of real-world applications (pp. vii–xxii). Singapore: World Scientific.

Lopez, S., & Weinert, R. (2004). An enhanced gene expression programmingapproach for symbolic regression problems. International Journal of AppliedMathematics in Computer Science, 14, 375–384.

Margny, M. H., El-Semman, I. E. (2005). Extracting logical classification rules withexpression programming: micro array case study. In Conference proceedings,AIML 05, Cairo, Egypt.

Pindyck, R., & Rubinfeld, D. (1998). Econometric models and economic forecasts (4thed.). New York: McGraw-Hill.

Shapiro, A. F. (2000). A Hitchhiker’s guide to the techniques of adaptive nonlinearmodels. Insurance, Mathematics and Economics, 26, 119–132.

Shin, Y., & Ghosh, J. (1991). The Psi-Sigma Network: An efficient higher-order neuralnetwork for pattern classification and function approximation. In ProceedingsIJCNN, Seattle, July, 13–18.

Tenti, P. (1996). Forecasting foreign exchange rates using recurrent neuralnetworks. Applied Artificial Intelligence, 10, 567–581.

Vecci, L., Piazza, F., & Uncini, A. (1998). Learning and approximation capabilities ofadaptive spline activation neural networks. Neural Networks, 11, 259–270.

forecasting and trading the eur/usd exchange rate with gene expression and psi sigma neural networks

Documents