Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

Download Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

Post on 21-Jul-2016




1 download


  • 1

    Forecasting daily and monthly exchange rates with machine learning


    Vasilios Plakandaras1,

    Theophilos Papadimitriou*, Periklis Gogas


    Department of Economics

    Democritus University of Thrace.


    We combine signal processing to machine learning methodologies by introducing a

    hybrid Ensemble Empirical Mode Decomposition (EEMD), Multivariate Adaptive

    Regression Splines (MARS) and Support Vector Regression (SVR) model in order to

    forecast the monthly and daily Euro (EUR)/United States Dollar (USD), USD/Japanese

    Yen (JPY), Australian Dollar (AUD)/Norwegian Krone (NOK), New Zealand Dollar

    (NZD)/Brazilian Real (BRL) and South African Rand (ZAR)/Philippine Peso (PHP)

    exchange rates. After the decomposition with EEMD of the original exchange rate series

    into a smoothed and a fluctuation component, MARS selects the most informative input

    datasets from the plethora of variables included in our initial data set. The selected

    variables are fed into two distinctive SVR models for forecasting each component

    separately one period ahead for daily and monthly data. The summation of the two

    forecasted components provides exchange rate forecasts. The above implementation

    exhibits superior forecasting ability in exchange rate forecasting compared to various

    models. Overall the proposed model a) is a combination of empirically proven effective

    techniques in forecasting time series, b) is data driven, c) relies on minimum initial

    assumptions and d) provides a structural aspect of the forecasting problem.

    JEL Code: G15

    Key words: Exchange rate forecasting, Support Vector Regression, Multivariate

    Adaptive Regression Splines, variable selection, Ensemble Empirical Mode

    Decomposition, time series forecasting.

    1 *, corresponding author +

  • 2

    1. Introduction

    Following the breakdown of the Bretton Woods fixed exchange rate system, a large

    number of models were proposed in order to outperform the random walk model in

    exchange rate forecasting. Forecasting the evolution of exchange rates apart from

    influencing the way an investor chooses to allocate his wealth into different portfolios, is

    also a key determinant in macroeconomic variable forecasting, since future

    macroeconomic information is projected on exchange rate values and vice versa (Rime,

    Sarno and Sojli, 2010).

    There is a vast literature regarding exchange rate forecasting. Overall, we can separate

    existing research initiatives into two general categories: models examining the influence

    of macroeconomic variables on the exchange rate markets and those that build on a

    micro-structural market approach. The first category includes the so-called monetary

    exchange rate models which attempt to establish a link between the evolution of

    exchange rates and the variability of fundamentals. To mention just a few, one of the

    first attempts to examine exchange rate evolution is the classic model of Mundell (1968)

    and Fleming (1962) and its extension, i.e. the sticky price monetary model proposed by

    Dornbuch (1976). On a similar perspective, the flexible price monetary model was for

    many years the workhorse of exchange rate economics (Bilson; 1978, Stockman, 1980;

    Lucas, 1982) extended by Frankel (1979) to its real interest rate differential variant.1

    A large part of the critique against the ability of fundamentals to forecast exchange rates

    stems from the work of Messe and Rogoff (1983). In their seminal paper they test

    various monetary exchange rate models against a random walk (RW) with a drift model

    in out-of-sample forecasting. They find that regardless of the version of the monetary

    model used, none outperforms the RW in terms of Mean Square Error. This suggestion

    has been for many years the standard benchmark for exchange rate forecasting models.

    Mark and Sul (2001) using panel regression models reject the univariate framework of

    Messe and Rogoff (1983) concluding that fundamentals possess significant forecasting

    1 A complete reference of all monetary exchange rate literature is beyond the scope of this paper. An interest reader is referenced to Ronald Macdonald (2010).

  • 3

    ability on cross sectional data examination. On a broader perspective, Cheung et al

    (2005) argue against the ability of the widely used Mean Square Error metric for

    evaluating the forecasting power of a structural model. Studying a vast variety of

    monetary exchange rate models, they find that the forecasting ability of each model

    depends on the time period considered for evaluation. On a more promising path,

    Molodtsova and Papell (2009) report superior out-of-sample predictability on short

    forecasting horizons for monetary structural models compared to a RW for a sample of

    eleven currencies. Overall, the examination of the existing literature leads to the

    conclusion that fundamentals have not yet established their value in exchange rate

    forecasting since their main drawbacks have not been convincingly overturned.

    Under a different perspective, the microstructure proposition of the exchange rate

    market focuses on extremely short term forecasting (from daily to tick-by-tick data) and

    incorporates a broad category of input variables. Evans and Lyons (2002, 2005) find that

    institutional aspects of the foreign exchange market such as order flow volume (net

    value of long and short positions) can be of significant importance. As Lyons (2001)

    states it, unlike macroeconomic expectations that are based on surveys, order flow is a

    more realistic measurement of market sentiment since it represents ones direct

    willingness to back his beliefs up with money. Nevertheless, Sager and Taylor (2008)

    examine the forecasting ability of various currencies and forecasting horizons and reject

    the superiority of order flow models against the RW. Killeen et al (2006) argue that the

    predictive information content in order flow models decays rapidly over time and thus

    their forecasting ability is time limited since exchange rates revert back to a RW model,

    implying that the exchange rate forecasting puzzle is far from being considered fully


    On similar strands, Karemera and Kim (2006) develop Auto Regressive Integrated

    Moving Average (ARIMA) models that outperform random walk models for a number

    of currencies on a monthly forecasting horizon but the forecasting ability is strongly tied

    to the time period under evaluation. Cheung (1993) detects long-run memory in

    exchange rates and proposes the use of Auto Regressive Fractionally Integrated Moving

    Average, but with limited success. On the other hand, Generalized Auto Regressive

    Conditional Heteroskedasticity (GARCH) models are a popular selection among

    practitioners for volatility modeling, although lacking clear cut validation of their

  • 4

    forecasting ability. Bollerslev and Wright (2001) compare GARCH and Exponential

    GARCH models to autoregressive models on high frequency data, concluding that

    GARCH models possess forecasting power but tend to forecast worse than AR models

    on high frequency series such as daily or intraday data. Later, Galbraith and Kisinbay

    (2005) reach to similar findings with the above for Deutche Mark/USD and Japanese

    Yen (JPY)/USD exchange rates. Projections based on past realized variance with

    autoregressive models provide better one step ahead forecasts than GARCH and

    Fractionally Integrated GARCH models for a period of up to 30 trading days.

    Apart from econometricians, the intense economic interest for exchange rate forecasting

    has attracted researchers from a wide variety of scientific areas. Machine Learning

    methodologies and more specifically Neural Networks (NN) (Semprinis et al, 2013) and

    Support Vector Machines (SVM) gained significant merit based on their ability to model

    non-linear systems with minimum initial assumptions and high forecasting accuracy.

    Dunis and Williams (2002) compare alternative models: NN, Random Walk, Auto

    Regressive Conditional Heteroskedasticity, Moving Average Convergence/Divergence,

    Auto Regressive Moving Average and a Logit model on the EUR/USD exchange rate.

    The sample spans May 2000 to July 2001 and use a daily forecasting horizon. They

    verify the statistical and economic superiority of the NN based scheme over the

    alternative models. Brandl et al., in their 2009 paper, use Genetic Algorithms for feature

    selection and set their model using the SVR methodology. The initial dataset includes

    variables suggested by the Purchasing Power Parity, the Covered Interest Rate Parity,

    and the Uncovered Interest Rate Parity. The proposed model outperforms an NN, an

    OLS regression and an ARIMA model on monthly out-of-sample forecasting for

    EUR/USD, USD/JPY and USD/GBP rates. On a similar research framework, Ince and

    Trafalis (2006) couple ARIMA and SVM for improving directional forecasting of the

    EUR/USD exchange rate and outperforming Logit/Probit models.

    The use of the RW model as a forecasting benchmark is also a contemporaneous testing

    of the Efficient Market Hypothesis (EMH). Proposed by Eugene Fama (1965), states

    that the determination of prices in an efficient market follows a random walk and thus it

    is impossible to create a forecasting model that achieves sustainable positive returns on

    the long-run. The EMH is usually presented in three forms; the weak, the semi-strong

    and the strong form of efficiency. We have a weak-form efficient market when historic

  • 5

    prices of the variable in question cannot forecast the future ones, as the generating

    mechanism of prices follows a random walk. Thus, autoregressive models have no

    forecasting power and the best forecast about next periods price is todays price. Semi-

    strong efficiency imposes more strict assumptions in that all historic prices and all

    publicly available information is already reflected in current asset prices and thus they

    cannot be used successfully in forecasting. Finally, the strong EMH builds on the semi-

    strong case adding all private information and thus making impossible to forecast

    successfully the future evolution of an assets price. Overall, outperforming the RW

    model is an indication of potential economic gains for a trader that follows an alternative

    trading strategy.

    In this paper we propose a hybrid model that combines signal processing to Machine

    Learning methodologies. Following existing literature we compile an initial dataset

    containing a plethora of variables. Then we suppress noise by decomposing all series

    with EEMD into a smoothed and a fluctuation component. MARS selects the most

    informative input dataset for each component of the forecasted exchange rate. On a final

    stage, two SVR models are trained for forecasting each component of the exchange rate

    separately. The additive series of the two forecasted component series is evaluated for

    out-of-sample forecasting.

    We choose one day ahead forecasting as suggested by Rime, Sarno and Sojli (2010)

    because: (i) one-day ahead forecasts are implementable, (ii) it is a relevant horizon for

    practitioners (e.g. most currency hedge funds), (iii) unlike intraday forecasts it involves

    interest rate considerations, and (iv) it is unlikely that gradual learning based on this data

    will allow forecasting at much longer horizons. Moreover, one month ahead forecasting

    is evaluated for comparison to a longer horizon. We consider our method for the Euro

    (EUR)/United States Dollar (USD), USD/Japanese Yen (JPY), Australian Dollar

    (AUD)/Norwegian Krone (NOK), New Zealand Dollar (NZD)/Brazilian Real (BRL) and

    South African Rand (ZAR)/Philippine Peso (PHP) exchange rates. According to the

    triennial survey of BIS (2010) USD, EUR, JPY, AUD and NOK belong to the ten most

  • 6

    traded currencies in the foreign exchange rate market2, while the latter three exhibit very

    small trading volumes. Since the trading volume of a currency reflects the economic

    interest of traders, we select USD, EUR and JPN as high, NOK, AUD and NZD as

    medium and BRL, PHP, ZAR as currencies with small trading volumes respectively. In

    this way we evaluate the forecasting power of our proposed model by examining

    exchange rates of different volumes and thus we expect that different trading interest

    leads to different versions of market efficiency.

    Following the empirical findings of West et al (1993), Cheung et al (2005) and Rime et

    al (2010) statistical evaluation itself cannot guarantee tangible economic gains for a

    trader. Therefore we extend our research framework not only to statistical metrics as

    MSE, but also conduct mean-variance analysis of returns for a dynamic trading strategy

    based on the forecasts provided by our model. We calculate the Sharpe ratio (SR) of a

    trading strategy following forecasts of the proposed model, since SR is a common

    performance measure among practitioners.

    2 USD:42,45%, EUR: 19,05%, JPY:9,5%, NZD:0,9%, NOK: 0,65%, BRL: 0,35%, PHP: 0,1% and ZAR: 0,35% of the total trading volume.

  • 7

    2. Methodology Dataset

    2.1 Methodology overview

    As reported by Ronald Macdonald (2010), the volatility of exchange rates time series is

    significantly higher than the volatility of other macroeconomic variables, with the

    exception of interest rate differentials. The above phenomenon can be attributed to

    transaction costs (which shift the original price of the series away from its original value)

    and to the difficulty of observing the exact price of the exchange rate, since the daily

    exchange rate is the approximation of a large number of tick-by-tick data. Smoothing

    techniques can provide an alternative representation of the exchange rate series closer to

    the original underlying phenomenon, reducing noise on data set.

    In the voluminous literature regarding time series smoothing, several methodologies are

    proposed3. A key issue in all smoothing methodologies is the definition of the optimal

    point, beyond which smoothing results to distortion rather than noise reduction.

    Apparently, since the time series of the actual phenomenon is always unknown and we

    can only observe its noise additive variant, we can only assume its actual representation

    through the use of a smoothing function. Unfortunately, many smoothing

    implementations select ad hoc the smoothness parameters of the model, lacking

    empirical evidence.. To overcome this drawback, our implementation exploits the

    unique abilities of a relatively novel signal decomposition method called Ensemble

    Empirical Mode Decomposition (EEMD) coupled with the criteria proposed by Guo and

    Tse (2013) and the trend extraction method of Moghtaderi et al (2013). The main

    advantage of this smoothing framework is the absence of initial assumptions about data

    and the tuning of all model parameters based solely on data characteristics.

    Another difficulty in training a forecasting model is the proper selection of input

    variables. A large part of existing literature selects arbitrary input dataset of the

    forecasting model, based on the propositions of a theoretical framework such as the

    sticky or the flexible price model aforementioned. Instead of hand-picking and then

    3 For a detailed survey on the field of trend extraction and smoothing on time series the interested reader is referred to Alexandrov et al (2009).

  • 8

    searching for the empirical confirmation of our decision, we implement Multivariate

    Adaptive Regression Splines (MARS) for automatic variable selection based on

    statistical loss metrics. After decomposing all initial time series into a smoothed and a

    residual fluctuating component (from each series we get its two components), we train

    two MARS models for selecting the input variables of every exchange rate; one MARS

    model for the smoothed and one for the fluctuating component. The selected variables

    are then fed into two Support Vector Regression (SVR) models for forecasting each

    component separately. Finally, the summation of the two forecasted component series is

    evaluated for out-of-sample forecasting. An overview of the proposed model is depicted

    in Figure 1.

    Figure 1: Overview of the proposed EEMD-MARS-SVR model.

    An innovation of this paper in comparison to other noise reduction methodologies

    (Premanode and Toumazou, 2013) is that we model the fluctuating part of the initial

    time series separately from its smoothed variant, but taking into account forecasts of

    both parts in order to avoid information loss. In other words, since the observed value of

    the exchange rate in the exchange rate market is always a noise additive version of the

    underlying phenomenon, we study the observed value into a segregated smoothed part

    and a noise component (fluctuating part). Thus we can produce models that are better

    fitted to the characteristics of each series. Then, we restructure a forecasted proxy of the

  • 9

    actual observed value from both component forecasts so as to approximate the actual

    observed exchange rate value, which always includes noise.

    2.2 Ensemble Empirical Mode Decomposition and time series smoothing

    The EEMD is a data driven algorithm that decomposes a time series into finite additive

    oscillatory components called Intrinsic Mode Functions (IMFs). Proposed by Wu and

    Huang (2009), the main advantage of EEMD is the lack of initial assumptions about data,

    such as the need for linearity or stationarity. The decomposition into IMFs is achieved

    through an iterative scheme, until stopping criteria are met and the residual has no local

    maxima. The generic EEMD procedure is described on the Appendix and an example of

    a decomposition applied on daily EUR/USD exchange rate is depicted in figure 2.

    Figure 2: Decomposition of the original (1

    st row) daily EUR/USD exchange rate into 10

    series. The last row (straight line) is the residue of the EEMD process.

    The main difference of EEMD in comparison to simple Empirical Mode Decomposition

    (EMD) is the addition of white Gaussian noise to the original series before

    decomposition. Iteratively white noise is used and an ensemble mean of all IMFs is

  • 10

    taken as the final result (i.e. the mean of the 1st IMF of all decompositions is the final 1


    IMF and so on). The ensemble of the decomposed IMF is relieved from the initially

    added noise4.

    The amplitude of the Gaussian noise variant added is chosen according to the

    maximized relative RMSE criterion proposed by Guo and Tse (2013):

    ( ( ) ( ))

    ( ) ( )

    where n is the number of observations, the original series and ( ) the IMF with

    the maximum correlation with the initial exchange rate series.

    The proposed Relative RMSE is the ratio between the RMSE of the original series and a

    selected IMF. The IMF with the highest correlation is expected to preserve the main

    elements of the exchange rate under decomposition, relieved from the added white noise.

    So, this specific IMF is selected for evaluating the decomposition performance for

    different Gaussian noise amplitudes.

    Following Moghtaderi et al (2013) we compute a smoothed version of the initial time

    series as the sum of the last few IMFs and the residual of the EEMD decomposition.

    With mathematical notation, the above smoothing function is expressed with:

    ( )

    where is the smoothed variant of the initial series for index , L the total number of

    IMFs, r the final EEMD residual and the optimum index of the IMFs to be summed.

    So, smoothing depends on selecting the optimum index , for constructing the

    smoothed series.

    From the decomposition of the original exchange rate series in Figure 2, we notice that

    the frequency of every IMF drops as we move from old to new IMFs, until no further

    4 For more details the interested reader is referenced to Wu et al (2009).

  • 11

    decomposition is possible. Rilling et al (2005) examine fractional Gaussian noise

    functions and find that apart from frequency, the averaged energy of every IMF

    decreases as the index order increases. In general, the mathematical notation of a

    signals energy is given by:

    | |

    ( )

    where denotes time observations of the i-th IMF, is the energy of the i-th IMF

    and L is the total number of IMFs. Then, the averaged energy of an IMF would be:


    Moghtaderi et al (2013) expand the energy drop assumption to broadband functions

    showing that is a decreasing series of i. Nevertheless, when we depart from fractional

    Gaussian noise to general form functions, the energy of certain IMFs increase instead of

    decreasing. Exploiting this empirical observation the authors argue that the optimum

    index of the smoothing function lies with the smallest index, where the average

    energy rises for the first time.

    For instance, in Figure 2, after the decomposition of the daily time series into 10 IMFs

    and the residual, energy rises on the 4th and 6

    th IMF. So by summing up 4th to 10th IMF

    plus the residual, we obtain the smoothed function representation of the red curve in

    Figure 3, while figure 4 depicts implementation on monthly data.

  • 12

    Figure 3: Daily EUR/USD exchange rate and smoothed series.

    Figure 4: Monthly EUR/USD exchange rate and smoothed series.

    2.3 Support Vector Regression (SVR)

    The Support Vector Regression is a direct extension of the classic Support Vector

    Machine model. The algorithm proposed by Vladimir Vapnik et al (1992), originates

    from the field of statistical learning. When it comes to regression, the basic idea is to

  • 13

    find a linear function that has at most a predetermined deviation from the actual values

    of the initial series. In other words we do not care about the error of each forecast as

    long as it doesnt violate a threshold, but we will not tolerate a higher deviation. The

    Support Vector (SV) set which bounds this error-tolerance band is located in the

    dataset through a minimization procedure.

    One of the main advantage of SVR in comparison to other machine learning techniques

    is the ability to identify global minima avoiding local ones, thus reaching an optimal

    solution. This aspect is crucial to the generalization ability of any model in producing

    accurate and reliable forecasts. The model is built in two steps: the training step and the

    testing step. In the training step, the largest part of the dataset is used for the estimation

    of the function (i.e. the detection of the Support Vectors that define the band); in the

    testing step, the generalization ability of the model is evaluated by checking the models

    performance in the small subset that was left aside in the first step performing out-of-

    sample forecasting.

    Using mathematical notation we start from a training data set

    ( ) ( ) ( ) , where are

    observation samples and is the dependent variable (the target of the regression system

    that we need to approximate). The methods scope is to minimize the loss function


    ) subject to| (

    )| , thus we enforce an upper

    deviation threshold creating an error-tolerance band.

    Expanding this initial framework, Vapnik and Cortes (1995) propose a soft margin

    model, i.e. they allowed the existence of data points outside the error tolerance zone,

    though they penalized them proportionally to the distance from the edge of the zone. So,

    they introduce slack variables to the loss function and controlled through a cost

    parameter C, resulting in the loss function (


    ) ).

    In this way the primal problem that we wish to solve is:

  • 14


    ) (

    ) ( )



    ( )


    are the Lagrange multipliers from the Lagrangian primal function (5).

    Instead of solving (3) we attack the dual form of the problem, which takes the form:



    ) ( )

    ( ) (

    ) ) (6)

    subject to ( ) and

    The solution of the primal problem (5) is

    ( )

    ( )

    and ( )

    ( )

    Naturally, not all phenomena can be described by strictly linear functions. To overcome

    such a drawback, we map initial data into a higher dimensional space were such a linear

    function exists (Figure 5). This mapping is achieved through the use of a kernel function,

    also known as kernel trick, since the computation of such a projection for a

    multivariate problem would be impossible. Then the linear solution of the procuring

    univariate equation is re-projected back to its original dimensional space. In this way,

    with the selection of both linear or non-linear kernels SVR can also approximate non-

    linear phenomena.

  • 15

    Figure 5: Upper and lower threshold on error tolerance indicated with letter . The boundaries of the error tolerance band are defined by Support Vectors (SVs). On the

    right we see the projection form 2 to 3 dimensions space and the projected error

    tolerance band. Forecasted values greater than get a penalty according to their distance from the tolerance accepted band (source: Scholckopf and Smola, 1998).

    The described mapping is performed on four kernels: the linear, the radial basis function

    (RBF), the sigmoid and the polynomial. The mathematical representation of each kernel


    Linear ( ) (9)

    RBF ( )


    Polynomial ( ) ( )


    Sigmoid(MLP) ( ) ( ) (12)

    with factors d, r, representing kernel parameters.

    For the construction of the SVR model we used LIBSVM, an SVM model computation

    software package developed by Chang and Lin (2011).




  • 16

    2.4 Multivariate Adaptive Regression Splines

    Multivariate Adaptive Regression Splines (MARS) proposed by Freidman (1991), is a

    non-parametric form of piece-wise regression. In global parametric methods such as

    linear regression, the relationship between a depended variable and a set of explanatory

    variables is described using a global parametric function fitted universally to all data set.

    In a non-parametric approach the available data are separated into sub-regions and a

    model is locally fitted to each sub-region. The points separating the regions are called

    knots. The position and number of the knots is determined iteratively based on a lack-of-

    fit criterion.

    Starting from a training dataset ( ) ( ) ( )

    , where are input variables and is the dependent variable vector (the

    target of the regression system that we need to approximate), MARS builds a model of

    the form

    ( ) ( ) (13)

    where is a constant, are local regression model coefficients of each sub-region, m

    the number of the sub-regions of the data and ( ) are spline basis functions of the


    ( ) ( ) (14)

    where is the corresponding knot of the sub-region i (a constant number) and the

    data sample of the sub-region i.

    MARS models are developed through a two stage forward/backward stepwise regression

    procedure. In the forward stage, the entire D is split arbitrarily into overlapping sub-

    regions and model parameters are selected by minimizing a lack-of-fit criterion. On the

    backward stage, basis functions (variables) and knots that no longer contribute to the

    accuracy of the fit are removed. The lack-of-fit criterion is a modified version of the

    generalized cross validation criterion (MGCV) expressed as:

    ( )

    ( ( ) ) (15)

  • 17

    where C(M) is the number of parameters being fit, n is the total number of observations

    and d is a penalty factor. A representation of a MARS model compared to a parametric

    linear regression model is depicted in Figure 6.

    Figure 6: A MARS and a parametric linear regression model representation.

    The basic advantages of MARS are its data driven regression procedure, the lack of

    initial assumptions about data and the detection of an optimum input variable set

    through a lack-of-fit criterion. For the development of MARS models we used the

    software package ARESLab developed by Jekabsons (2011).

    2.5 Dataset

    As already stated, the scope of this paper is to create models that produce one period

    ahead forecasts for 5 nominal exchange rates, using monthly and daily data. The data

    span the period from 1/1/1999 to 30/10/2011, not including weekends and holidays. We

    use as explanatory variables data of 7 exchange rates, closing prices of major stock

    indices in the U.S. and the Euro zone, spot prices of 10 precious and non-precious

    metals, 18 commodities including crude oil and moving averages of 3,5,10 and 30 days

    of the EUR/USD exchange rate (since it is the exchange rate with the highest volume on

    the exchange rate market). We also include trade weighted USD indices compiled by the

    Federal Reserve Bank of Saint Louis (FRED), EURIBOR rates of various maturities and

    interest rate spreads between commercial paper, the federal funds and the effective

    federal funds rate. Finally, we include basic macroeconomic variables from the US, EU,

    Japan, UK, Australia, Norway, South Africa, Brazil and New Zealand.

  • 18

    Table 1: Input Variables Commodities

    Crude Oil





    Orange Juice





    Rough Rice

    Soybean Meal

    Soybean Oil


    Feeder Cattle

    Lean Hogs

    Live Cattle

    Pork Bellies

    Iron Ore

    Crude Oil

    Iron Ore















    Stock Indices

    Dow Jones

    Nasdaq 100

    S & P 500


    CAC 40


    USD Trade Weighted

    Indices Major partners

    Broad Index

    Other Partners

    Interest rates

    T-bill 6 months

    T-bill 10 years

    Spread MLP-EURIBOR 3M

    Spread MLP-Eonia

    Spread FF-CP

    Spread FF-EFF


    EURIBOR 1 Week

    ECB Interest rate

    EURIBOR 1 Month

    FED rate


    Japan/New Zealand/South

    Africa/ Brasil/ Australia/ UK


    Cunsumer Price Index

    Productivity index

    Gross Domestic Product

    Trade Balance

    Unemployment rate

    Exchange Rates








    Technical Analysis

    variables Relative Strengh




    MA10 EUR/USD

    MA30 EUR/USD


    EU Variables

    Unemployment rate


    Consumer Price


    Current accounts






    US variables

    Consumer Price




    trade balance

    US deficit or




    Unemployment rate

  • 19

    3. Empirical Results

    3.1.1 Statistical Evaluation

    A crucial step in measuring the generalization ability of a model is its out-of-sample

    forecasting performance. Thus, in order to test the generalization ability of the selected

    model, the dataset is split in two parts; the train subset and the test subset. The train/test

    set ratio chosen both for monthly and daily data is 80/20. Specifically, we leave aside 30

    monthly observations (6/2009-10/2011) and 648 daily observations (2/4/2009-

    30/10/2011) for out-of-sample forecasting. As forecast evaluation metrics we use the

    Root Mean Square Error (RMSE), the Root Mean Square Percentage Error (RMSPE)

    and the Mean Absolute Percentage Error (MAPE).

    ( )

    ( )






    ( )

    where is the forecasted ith value of the actual Yi rate and n is the total number of the


    After implementing EEMD to decompose each time series into a smoothed and the

    remaining fluctuating component, MARS selects the most informative input dataset for

    each component of the forecasted exchange rate.

    Table 2: Optimum input variable number selected by MARS



    Monthly data Daily data







    Smoothed Series

    EUR/USD 16 1 22 1

    USD/JPY 14 4 20 1

    AUD/NOK 11 7 19 1

    ZND/BRL 5 1 17 1

    ZAR/PHP 1 1 22 1

    As we observe from table 2, MARS selects only the smoothed component of the current

    exchange rate as input variable for daily forecasting of the smoothed component, while

  • 20

    for monthly forecasting of USD/JPY and AUD/NOK constructs a structural model of

    more input variables. For the fluctuating component, the input dataset consists from

    more variable for daily forecasting horizon and a less for monthly data.

    In order to test the forecasting ability of the EEMD-MARS-SVR model against

    alternatives, we compare it with a Random Walk (RW) with a drift, an autoregressive

    (using as input variables only the two decomposition components for each exchange rate)

    AR-SVR and an EEMD-AR-SVR model. The results from the evaluation of all models

    are depicted in tables 3 and 4.

    Table 3: Out-of-sample forecasting results on monthly exchange rates



    RW 0.0512 0.0379 0.0295

    AR-SVR(linear) 0.2260 0.0379 0.0294

    EEMD-AR-SVR(linear) 0.0426 0.0314 0.0239

    EEMD-MARS-SVR(RBF) 0.0407 0.0300 0.0238


    RW 2.4739 0.0281 0.0221

    AR-SVR(linear) 3.3696 0.0395 0.0354

    EEMD-AR-SVR(linear) 1.9430 0.0225 0.0181 EEMD-MARS-SVR(linear) 1.9996 0.0226 0.0177


    RW 0.1054 0.0189 0.0151

    AR-SVR(linear) 0.1693 0.0301 0.0261

    EEMD-AR-SVR(poly) 0.1017 0.0187 0.0144

    EEMD-MARS-SVR(poly) 0.1287 0.0233 0.0184


    RW 0.0359 0.0275 0.0220

    AR-SVR(RBF) 0.0342 0.0263 0.0209

    EEMD-AR-SVR(Sigmoid) 0.0253 0.0192 0.0158

    EEMD-MARS-SVR(Sigmoid) 0.0256 0.0194 0.0149


    RW 0.2075 0.0348 0.0270

    AR-SVR(linear) 0.2054 0.0344 0.0267

    EEMD-AR-SVR(poly) 0.1862 0.0317 0.0213 EEMD-MARS-SVR(Sigmoid) 0.2017 0.0337 0.0243

    Note: Best forecasted values are noted in bold. Best kernel is noted in parenthesis.

  • 21

    Table 4: Out-of-sample forecasting results on daily exchange rates



    RW 0.0096 0.0070 0.0055

    AR-SVR(linear) 0.0980 0.0070 0.0055

    EEMD-AR-SVR(linear) 0.0088 0.0064 0.0051

    EEMD-MARS-SVR(linear) 0.0069 0.0050 0.0040


    RW 0.5950 0.0068 0.0050

    AR-SVR(linear) 0.6088 0.0069 0.0053

    EEMD-AR-SVR(linear) 0.4800 0.0055 0.0042 EEMD-MARS-SVR(linear) 0.4946 0.0057 0.0043


    RW 0.0340 0.0063 0.0049

    AR-SVR(linear) 0.0343 0.0063 0.0050

    EEMD-AR-SVR(linear) 0.0292 0.0053 0.0042

    EEMD-MARS-SVR(poly) 0.0583 0.0102 0.0078


    RW 0.0106 0.0082 0.0064

    AR-SVR(linear) 0.0108 0.0082 0.0064

    EEMD-AR-SVR(RBF) 0.0093 0.0072 0.0057

    EEMD-MARS-SVR(linear) 0.0098 0.0076 0.0060


    RW 0.0661 0.0111 0.0085

    AR-SVR(linear) 0.0661 0.0111 0.0085

    EEMD-AR-SVR(linear) 0.0539 0.0091 0.0070

    EEMD-MARS-SVR(linear) 0.0510 0.0085 0.0066

    Note: Best forecasted values are noted in bold. Best kernel is noted in parenthesis.

    The model incorporating EEMD decomposition exhibits superior forecasting ability

    both on daily and monthly data. In all exchange rates but the EUR/USD, the

    autoregressive version of the EEMD-SVR model has the smallest RMSE, while for the

    EUR/USD series the structural version of the model with the variables selected by

    MARS for the fluctuating component has lower RMSE. With the exception of

    AUD/NOK series, both the EEMD-AR-SVR and the EEMD-MARS-SVR outperform

    the RW model, while for the AUD/NOK rate only the autoregressive EEMD-AR-SVR

    model is more accurate.

    The empirical findings are rather interesting and in line with our expectations about

    trading volumes and EMH. The EUR/USD trading volume accounts for 28% of the total

  • 22

    daily trading volume in exchange rate market5. A successful estimation of the future

    price of the exact exchange rate with the aforementioned trading volume could yield

    high trading profits. Thus we expect that the intense economic interest for the EUR/USD

    rate minimizes profit margin rapidly by moving the exchange rate to buy/sell

    equilibrium and ultimately leads market to efficiency.

    Indeed, our empirical findings are in favor of a weak form of market efficiency on the

    EUR/USD exchange rate market. Both with daily and monthly data, the structural

    variant of our proposed model presents better forecasting abilities, while for daily

    forecasting horizon the autoregressive model is only slightly better than the RW, in

    comparison to the structural one. So, we cannot reject the weak form efficiency for the

    EUR/USD rate. On the contrary, market efficiency is rejected for all other exchange

    rates, since the autoregressive model provides significantly better forecasts than the RW

    and the structural variant, which is in line with our expectations due to their lower daily

    trading volumes in comparison to EUR/USD.

    Overall, SVR models coupled with EEMD decomposition as a preprocessing step

    produce superior out-of-sample forecasts compared to the RW model, indicating the

    possibility of potential profits for a trading strategy based on the EEMD-SVR forecasts.

    3.2 Trading performance

    In the previous section we measured the forecasting ability of the EEMD-SVR model

    with forecasting performance metrics. Nevertheless, when it comes to trading, the

    interest of a market participant is the evaluation of a model by means of overall

    volatility and return, since statistical performance is not always synonymous to profit (as

    pointed out by Cheng et al (2005)). Specifically, we examine whether there are any

    additional economic gains for an investing strategy based on forecasts of the proposed

    model, in comparison to trading based on the RW model; i.e. that the current price of the

    exchange rate is the best approximation to future price.

    5 According to the triennial report of BIS (2010) EUR/USD accounts for the 28% of the total daily trading volume, USD/JPY for 14% and all the other exchange rates are well below 1%.

  • 23

    Sharpe ratio, or return-to-variability ratio, measures the risk-adjusted returns of a

    portfolio or investment strategy and is widely used by investment banks and asset

    management companies to evaluate investment performance. It measures the risk-

    adjusted performance of a portfolio and its mathematical expression is:


    where is the annualized return of the exchange rate and the annualized standard

    deviation of the trading strategy.

    In order to measure the profitability of the model, we transform our forecasts into

    returns with the formula:


    ) ( )

    where is the return at time t+1 and is the exchange rate at time t. The trading

    strategy is to stay long on the currency of the nominator of each exchange rate when the

    forecasted return is positive and go short (sell) the currency at the opposite occasion.

    When the forecasted return is zero, we are indifferent, keeping our previous position. In

    other words the problem breaks down to a binary problem of 2 states; up/down, buy/sell

    and in the special case stay neutral. For simplicity we assume that the investor begins

    with an initial amount of all five currencies of the nominator, he invests entirely to one

    of the two currencies for each exchange rate and receives his long/short decision based

    on the forecast value of the exchange rate he has on time t, a position that he closes at

    the end of the next trading period (t+1). At the end of the trading horizon, he measures

    the annualized return and standard deviation, given by:

    ( )


    where is the annualized return, the annualized standard deviation, the standard

    deviation computed at the end of the forecasting horizon for all out-of-sample forecasted

    returns and the maximum annual transactions possible, here 252 for daily and 12 for

  • 24

    monthly data. Since the vast majority of transactions is made through electronic trade

    platform such EBS and Reuters dealing 3000 thus minimizing trading costs, we assume

    the least transaction cost possible for all exchange rates as of 1 pip per trade (0.0001

    spread between buy and sell price of the exchange rate).

  • 25

    Table 6: Trading Performance on monthly data


    Annualized return (excluding costs)(%) 1,24 -0,44 8,12 0,66 -1,08 -2,61 1,12 -8,15 3,91 0,46

    Annualized volatility (excluding costs)(%) 12,18 8 8,80 8,17 1,76 2,85 4,02 5,81 9,02 8,83

    Sharpe ratio(excluding costs) 0,10 -0,06 0.92 0,08 -0,61 -0,92 0,28 -1,40 0,43 0,05

    Positions taken (annualized) 10 5 10 8 2 4 2 5 6 6

    Transaction costs (annualized)(%) 0,003 0,002 0.0005 0,0004 0.0022 0.0036 0,0078 0.0019 0,0049 0,0049

    Annualized return (including costs)(%) 1,21 -0,46 8,12 0,66 -1,08 -2,61 1,11 -8,17 3,90 0,46

    Sharpe ratio (including costs)(%) 0,09 -0,06 0,92 0,08 -0,62 -0.92 0,28 -1,40 0,43 0,05

    Table 7: Trading Performance on daily data


    Annualized return (excluding costs)(%) 31,02 2,25 24,23 0,67 6,44 -3,05 9,59 -3,24 45,37 0,15

    Annualized volatility (excluding costs)(%) 7,74 8,15 7,75 7,90 6,83 6,87 8,77 8,74 12,43 13,53

    Sharpe ratio(excluding costs) 4 0,28 3,13 0.08 0,94 -0,44 1,09 -0.37 3,65 0,01

    Positions taken (annualized) 121 125 142 134 116 117 124 123 128 120

    Transaction costs (annualized)(%) 0,44 0,45 0.09 0.007 0,11 0,11 0,48 0,48 0,10 0,01

    Annualized return (including costs)(%) 30,59 1,80 24,23 0.66 6,34 -3,15 9,10 -3,72 45,26 0,04

    Sharpe ratio (including costs)(%) 3,95 0,22 3,12 0.08 0,93 -0,46 1,04 -0,43 3,64 0.004

  • 26

    The relatively high Sharpe ratio of the proposed trading strategies are generally in line

    with the empirical findings of previous studies about portfolios including exchange rates

    (Rime et al, 2010, Sermpinis et al, 2013). Nevertheless, the computed ratios from our

    approach are higher than previous studies both in daily and monthly forecasting


    Summarizing the results of table 6 and 7, a strategy based on the forecasts of the

    proposed EEMD-SVR methodology exhibits positive annualized returns (after trading

    costs) both for daily and monthly data on all exchange rates, with the exception of the

    AUD/NOK monthly exchange rate. Overall, the EEMD-SVR based trading strategy

    outperforms the RW model, producing economic gains.

    6 Sarno, Valente, and Leon (2006) calculate Sharpe ratios of forward bias trading strategies that range between 0.16 and 0.88, while Lyons (2001) reports a Sharpe ratio of 0.48 for an equally weighted investment in six currencies.

  • 27

    4. Conclusion

    The proposed hybrid EEMD-MARS-SVR model is a novel forecasting approach based

    on data driven methods. By segregating variable time series to a smoothed and a

    fluctuating component and forecasting each one separately, our model outperforms the

    RW in out-of-sample forecasting on the Euro (EUR) /United States Dollar (USD),

    Japanese Yien (JPY) /USD, Norwegian Krone (NOK) / Australian Dollar (AUD)/, New

    Zealand Dollar (NZD) / Brazilian Real (BRL) and Philippine Peso (PHP) / South

    African Rand (ZAR) exchange rates. Moreover, the adoption of an autoregressive

    expression for all series rejects the EMH for all exchange rate markets, but the

    EUR/USD rate where we cannot reject the weak form of market efficiency. Overall,

    machine learning approximations coupled with signal processing methodologies for time

    series smoothing can be exploited for adopting trading strategies that yield economic



    This research has been co-financed by the European Union (European Social Fund

    ESF) and Greek national funds through the Operational Program "Education and

    Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research

    Funding Program: THALES. Investing in knowledge society through the European

    Social Fund.

    We would also like to thank the participants of the 75st International Atlantic Economic

    conference held in Vienna, Austria in April 2013, for their helpful comments.


    Alexandrov, T., Bianconcini, S., Bee Dagum, E., Maass, P. and Mc Elroy, T. (2009) A

    review of some modern approaches to the Problem of trend extraction, Research

    Report Series, Statistics vol. 3, U.S. Census Bureau, Washington.

    Berger, D.W., Chaboud, A.P., Chernenko, S.V., Howorka, E. and Wright, J.H. (2008)

    Order flow and exchange rate dynamics in electronic brokerage system data,

    Journal of International Economics, vol.75, pp. 93109.

  • 28

    Bilson, J. (1978) The monetary approach to the exchange rate some empirical evidence,

    IMF Staff Papers, vol. 25(1), pp. 4875.

    Bollerslev, T. and Wright, J. H. (2001) High-frequency data, frequency domain

    inference, and volatility forecasting, Review of Economics and Statistics, vol. 83,

    pp. 596602.

    Brandl B., Wildburger U. and Pickl S. (2009) Increasing of the fitness of fundamental

    exchange rate forecast models. International Journal of Contemporary

    Mathematical Sciences, vol. 4 (16), pp. 779-798.

    Chang C.-C. and Lin C.-J. (2011) LIBSVM: a library for support vector machines. ACM

    Transactions on Intelligent Systems and Technology, vol. 2 (27), pp. 1-27.

    Software available at

    Cheung Y-W. (1993) Long memory in foreign-exchange rates, Journal of Business and

    Economic Statistics, vol. 11, pp. 93- 101.

    Cheung,Y.-W., Chinn M. and Pascual A.G. (2005) Empirical Exchange Rate Models of

    the Nineties: Are any Fit to Survive?, Journal of International Money and Finance,

    vol. 24(7), pp. 115075.

    Cortes C. and Vapnik V. (1995) Support-Vector Networks, Machine Leaming, vol 20,

    pp. 273-297.

    Dornbusch, R., (1976) Expectations and Exchange Rate Dynamics, Journal of Political

    Economy, vol. 86, pp. 1161-1176.

    Dunis, C. and Williams, M. (2002) Modeling and Trading the EUR/USD Exchange Rate:

    Do Neural Network Models Perform Better ?, Derivatives Use, Trading and

    Regulation, vol.8 (3), pp. 211-239.

    Evans, M.D.D. and Lyons, R.K. (2008) How is macro news transmitted to exchange

    rates?, Journal of Financial Economics, vol. 88,pp. 2650.

    Evans, M.D.D. and Lyons, R.K. (2002) Order flow and exchange rate dynamics, Journal

    of Political Economy, vol. 110, pp. 170180.

    Evans, M.D.D. and Lyons, R.K. (2005) Do currency markets absorb news quickly?,

    Journal of International Money and Finance, vol. 24,pp. 197217.

    Fama, E (1965) The Behavior of Stock Market Prices, Journal of Business. vol. 38, pp.


    Fleming, J., Kirby, C. and Ostdiek, B. (2001) The economic value of volatility timing.

    Journal of Finance, vol. 56, pp. 329352.

  • 29

    Fleming,J.M. (1962),Domestic Financial Policies under Fixed and Floating Exchange

    Rates, IMF Staff Papers, pp. 36979.

    Frankel, J.A., (1979) On the Mark.: A Theory of Floating Exchange Rates Based on

    Real Interest Differentials, American Economic Review, vol. 69, pp. 610-622.

    Freidman J. (1991) Multivariate adaptive regression splines, Annals of Statistics, vol. 19,

    pp. 1-141.

    Galbraith, J.W. and Ksnbay T. (2005) Content horizons for conditional variance

    Forecasts, International Journal of Forecasting, vol. 21, pp. 249-260.

    Guo W. and Tse P. (2013), A novel signal compression method based on optimal

    ensemble empirical mode decomposition for bearing vibration signals, Journal of

    Sound and Vibration, vol.332, pp. 423-441.

    Hannan, E. J., and B. G. Quinn (1979) The Determination of the Order of an

    Autoregression, Journal of the Royal Statistical Society, B, vol.41, pp.190195.

    Hodrick, R.J. and Prescott, E.C. (1997), Postwar US business cycles: an empirical

    investigation, Journal of Money, Credit, and Banking, vol. 29, pp. 1-16.

    Huang, M. L. Wu, S. R. Long, S. S. Shen, W. D. Qu, P. Gloersen, and K. L. Fan (1998),

    The empirical mode decomposition and the Hilbert spectrum for nonlinear and

    non-stationary time series analysis, Procccedings of the Royal Society of London,

    vol. 454A, pp. 903-993.

    Ince H. and Trafalis T. (2006) A hybrid model for exchange rate prediction. Decision

    Support Systems, vol. 42(2), pp. 1054-1062.

    Jekabsons G. (2011), ARESLab: Adaptive Regression Splines toolbox for

    Matlab/Octave, available at

    Karemera D. and Kim B, (2006), Assessing the forecasting accuracy of alternative

    nominal exchange rate models: the case of long memory, Journal of Forecasting

    Volume 25, Issue 5, pages 369380.

    Killeen, W.P., Lyons, R.K. and Moore, M.J. (2006) Fixed versus flexible: lessons from

    EMS order flow, Journal of International Money and Finance, vol. 25, pp. 551


    Lucas,R.E. (1982)Interest Rates and Currency Prices in a Two-Country World, Journal

    of Monetary Economics, vol. 10, pp. 33560.

    Lyons,R.K. (2001), The Microstructure Approach to Exchange Rates, Cambridge, MA:

    MIT Press.

  • 30

    Macdonald, Ronald. (2010) Exchange Rate Economics: Theory and Evidence. 1st

    Edition. New York: Routledge.

    Mark, N.C. and D. Sul (2001), Nominal Exchange Rates and Monetary Fundamentals:

    Evidence from a Small Post-Bretton Woods Panel, Journal of International

    Economics, vol. 53(1), pp. 2952.

    Meese, R. and Rogoff K. (1983) Empirical Exchange Rate Models of the Seventies: Do

    they Fit out of Sample?, Journal of International Economics, vol. 14, pp. 324.

    Moghtaderi A., Flandrin P, and Borgnat P, (2013), Trend filtering via Empirical Mode

    Decompostition, Computational Statistics and Data Analysis, vol. 58, pp. 114-126.

    Molodtsova, T. and Papell, D.H., (2009) Out- of- sample exchange rate predictability

    with Taylor rule fundamentals. Journal of International Economics. vol. 77,


    Mundell, R.A. (1968), International Economics (New York: Macmillan).

    Premanode B. and Toumazou C. (2013) Improving prediction of exchange rates using

    Differential EMD, Expert Systems with Applications, vol. 40, pp. 377384.

    Rilling G., Flandrin, P. and Goncalves, P. (2005) Empirical mode decomposition,

    fractional Gaussian noise and Hurst exponent estimation. IEEE International

    Conference on Acoustics, Speech, and Signal Processing, pp. 489492.

    Rime D., Sarno L. and Sojli E. (2010), Exchange rate forecasting, order flow and

    macroeconomic information, Journal of International Economics, vol. 80, pp. 72-


    Sager M.J. and Taylor M.P. (2008) Commercially available order flow data and

    exchange rate movements: caveat emptor. Journal of Money and Credit Banking,

    vol. 40, pp. 583625.

    Sarno L., Valente G. and Leon H. (2006) Nonlinearity in Deviations from Uncovered

    Interest Parity: An Explanation of the Forward Bias Puzzle, Review of Finance,

    European Finance Association, vol. 10(3), pages 443-482.

    Sermpinis G., Theofilatos K., Karathanasopoulos A., Georgopoulos F. E., Dunis C.

    (2013) Forecasting foreign exchange rates with adaptive neural networks using

    radial-basis functions and Particle Swarm Optimization, European Journal of

    Operational Research, vol. 225, pp. 528540.

    Stockman, A. (1980) A Theory of Exchange Rate Determination, Journal of Political

    Economy, vol. 88, pp. 67398.

  • 31

    Vapnik, V., Boser, B. and Guyon, I. (1992) A training algorithm for optimal margin

    classifiers, Fifth Annual Workshop on Computational Learning Theory, Pittsburgh,

    ACM, pp.144152.

    West, K.D., Edison, H.J. and Cho, D. (1993) A utility based comparison of some models

    for exchange rate volatility. Journal of International Economics, vol.35,pp.2345.

    Wu, Z., and N. E Huang (2009) Ensemble Empirical Mode Decomposition: a noise-

    assisted data analysis method., Advances in Adaptive Data Analysis., vol. 1, No.1,

    pp. 1-41.

  • 32


    The generic EEMD method can be described as follows:

    1) Add white noise with a predefined variation to the time series under


    2) Detect local minima and maxima.

    3) With cubic interpolation compute the upper and lower envelopes from local

    minima and maxima respectively.

    4) Compute the mean value of the lower and the upper envelope. If a) the number

    of local minima, local maxima and zero crossing points vary at most by one and

    b) the mean is approximately zero, then we subtract the mean from the initial

    time series. The residual is the first IMF. If the criteria are not met then we go

    back to step (2) and repeat the procedure until criteria fulfillment.

    5) Repeat step (4) until the residual has no local maxima and minima.

    6) Go back to step (1) and repeat the process for different versions of added white


    7) Take the mean of the ensemble of all decompositions for each IMF as the final



View more >