Forecasting Exchange Rates Using Feedforward and Recurrent

Download Forecasting Exchange Rates Using Feedforward and Recurrent

Post on 08-Jan-2017




2 download

Embed Size (px)


<ul><li><p>Forecasting Exchange Rates Using Feedforward and Recurrent Neural NetworksAuthor(s): Chung-Ming Kuan and Tung LiuReviewed work(s):Source: Journal of Applied Econometrics, Vol. 10, No. 4 (Oct. - Dec., 1995), pp. 347-364Published by: John Wiley &amp; SonsStable URL: .Accessed: 18/12/2011 23:47</p><p>Your use of the JSTOR archive indicates your acceptance of the Terms &amp; Conditions of Use, available at .</p><p>JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact</p><p>John Wiley &amp; Sons is collaborating with JSTOR to digitize, preserve and extend access to Journal of AppliedEconometrics.</p><p></p><p></p></li><li><p>JOURNAL OF APPLIED ECONOMETRICS, VOL. 10, 347-364 (1995) </p><p>FORECASTING EXCHANGE RATES USING FEEDFORWARD AND RECURRENT NEURAL NETWORKS </p><p>CHUNG-MING KUAN Department of Economics, 21 Hsu-chow Road, National Taiwan University, Taipei 10020, Taiwan </p><p>AND </p><p>TUNG LIU Department of Economics, Ball State University, Muncie, IN 47306, USA </p><p>SUMMARY </p><p>In this paper we investigate the out-of-sample forecasting ability of feedforward and recurrent neural networks based on empirical foreign exchange rate data. A two-step procedure is proposed to construct suitable networks, in which networks are selected based on the predictive stochastic complexity (PSC) criterion, and the selected networks are estimated using both recursive Newton algorithms and the method of nonlinear least squares. Our results show that PSC is a sensible criterion for selecting networks and for certain exchange rate series, some selected network models have significant market timing ability and/or significantly lower out-of-sample mean squared prediction error relative to the random walk model. </p><p>1. INTRODUCTION </p><p>Neural networks provide a general class of nonlinear models which has been successfully applied in many different fields. Numerous empirical and computational applications can be found in the Proceedings of the International Joint Conference on Neural Networks and Conference of Neural Information Processing Systems. In spite of its success in various fields, there are only a few applications of neural networks in economics. Neural networks are novel in econometric applications in the following two respects. First, the class of multilayer neural networks can well approximate a large class of functions (Hornik et al., 1989; and Cybenko, 1989), whereas most of the commonly used nonlinear time-series models do not have this property. Second, as shown in Barron (1991), neural networks are more parsimonious models than linear subspace methods such as polynomial, spline, and trigonometric series expansions in approximating unknown functions. Thus, if the behaviour of economic variables exhibits nonlinearity, a suitably constructed neural network can serve as a useful tool to capture such regularity. </p><p>In this paper we investigate possible nonlinear patterns in foreign exchange data using feedforward, and recurrent networks. It has been widely accepted that foreign exchange rates are I(1) (integrated of order one) processes and that changes of exchange rates are uncorrelated over time. Hence, changes in exchange rates are not linearly predictable in general. For a </p><p>comprehensive review of these issues, see Baillie and McMahon (1989). Since the empirical studies supporting these conclusions rely mainly on linear time series techniques, it is not unreasonable to conjecture that the linear unpredictability of exchange rates may be due to limitations of linear models. Hsieh (1989) finds that changes of exchange rates may be nonlinearly dependent, even though they are linearly uncorrelated. Some researchers also </p><p>CCC 0883-7252/95/040347-18 Received May 1992 ( 1995 by John Wiley &amp; Sons, Ltd. Revised September 1994 </p></li><li><p>C.-M. KUAN AND T. LIU </p><p>provide evidence in favor of nonlinear forecasts (e.g. Taylor, 1980, 1982; Engel and Hamilton, 1990; Engel, 1991; Chinn, 1991). On the other hand, Diebold and Nason (1990) find that nonlinearities of exchange rates, if any, cannot be exploited to improve forecasting. Therefore, we treat neural networks as alternative nonlinear models and focus on whether neural networks can provide superior out-of-sample forecasts. </p><p>This paper has two objectives. First, we introduce different neural network modeling techniques and propose a two-step procedure to construct suitable neural networks. In the first step of the proposed procedure, we apply the recursive Newton algorithms of Kuan and White (1994a) and Kuan (1994) to estimate a family of networks and compute the so-called 'predictive stochastic complexity' (Rissanen, 1987), from which we can easily select suitable network structures. In the second step, statistically more efficient estimates for networks selected from the first step are obtained by the method of nonlinear least squares using recursive estimates as initial values. Our procedure differs from previous applications of feedforward networks in economics (e.g. White, 1988; Kuan and White, 1990) in that networks are selected objectively. Also, the application of recurrent networks is new in applied econometrics; hence its performance would also be of interest to researchers. </p><p>Second, we investigate the forecasting performance of networks selected from the proposed procedure. In particular, model performance is evaluated using various statistical tests, rather than crude comparison. Financial economists are usually interested in sign predictions (i.e. forecasts of the direction of future price changes) which yield important information for financial decisions such as market timing (see e.g. Levich, 1981; Merton, 1981). We apply the market timing test of Henriksson and Merton (1981) to justify whether the forecasts from network models are of economic value in practice; a nonparametric test for sign predictions proposed by Pesaran and Timmermann (1992) is also conducted. Other than sign predictions, we, as many other econometricians, are also interested in out-of-sample MSPE (mean squared prediction errors) performance. We use the Mizrach (1992) test to evaluate the MSPE performance of networks relative to the random walk model. Our results show that network models perform differently for different exchange rate series and that predictive stochastic complexity is a sensible criterion for selecting networks. For certain exchange rates, some network models perform reasonably well; for example, for the Japanese yen and British pound some selected networks have significant market timing ability and/or significantly lower out-of- sample MSPE relative to the random walk model in different testing periods; for the Canadian dollar and deutsche mark, however, selected networks exhibit only mediocre performance. </p><p>This paper proceeds as follows. We review feedforward and recurrent networks in Section 2. The network building procedure, including the estimation methods, complexity regularization criteria, and a two-step procedure, are described in Section 3. Empirical results are analysed in Section 4. Section 5 concludes the paper. Details of the recursive Newton algorithms are summarized in the Appendix. </p><p>2. FEEDFORWARD AND RECURRENT NETWORKS </p><p>In this section we briefly describe the functional forms of feedforward and recurrent networks and their properties; for more details see Kuan and White (1994a). </p><p>A neural network may be interpreted as a nonlinear regression function characterizing the relationship between the dependent variable (target) y and an n-vector of explanatory variables (inputs) x. Instead of postulating a specific nonlinear function, a neural network model is constructed by combining many 'basic' nonlinear functions via a multilayer structure. In a feedforward network, the explanatory variables first simultaneously activate q hidden units in </p><p>348 </p></li><li><p>FORECASTING WITH NEURAL NETWORKS </p><p>an intermediate layer through some function I, and the resulting hidden-unit activations hi, i= 1, ..., q, then activate output units through some function i to produce the network output o (see Figure 1). Symbolically, we have </p><p>f n n </p><p>hi,,= Ty Yio + Ey,i . i = </p><p>1 </p><p>... . q j=1 </p><p>* q , (1) o, = PO + ,Ahi,, </p><p>i=! </p><p>or more compactly, P 4 t n ! </p><p>o,= a PO + E, Ai' Yo + YX,, i i=1 j= I (2) </p><p>=f,(x,, 0) where 0 is the vector of parameters containing all 4's and y's, and the subscript q of f signifies the number of hidden units in the network. </p><p>This is a flexible nonlinear functional form in that the activation functions v and b can be chosen quite arbitrarily, except that v is usually required to be a bounded function. Hornik et al. (1989) and Cybenko (1989) show that the function fq constructed in equation (2) can approximate a large class of functions arbitrarily well (in a suitable metric), provided that the number of hidden units, q, is sufficiently large. This property is analogous to that of nonparametric methods. As an example, consider the L2 approximation property. Given the dependent variable y and some explanatory variables x, we are typically interested in the unknown conditional mean M(x) - E(y Ix). The L2 approximation property asserts that if M(x) E L2, then for any e &gt; O, there is a q such that </p><p>EI M(x) - fq(x, O) 12&lt; (3) </p><p>Ot </p><p>Output Layer </p><p>Activation Function </p><p>h) , h2,t Hidden Layer </p><p>Activation Function </p><p>Input Layer </p><p>Figure 1. A simple feedforward network with one output unit, two hidden units, and three input units </p><p>349 </p></li><li><p>C.-M. KUAN AND T. LIU </p><p>Barron (1991) also shows that a feedforward network can achieve an approximation rate O(l/q) by using a number of parameters O(qn) that grows linearly in q, whereas traditional polynomial, spline, and trigonometric expansions require exponentially O(qe) terms to achieve the same approximation rate. Thus, neural networks are (asymptotically) relatively more parsimonious than these series expansions in approximating unknown functions. These two properties make feedforward networks an attractive econometric tool in (nonparametric) applications. </p><p>In a dynamic context, it is natural to include lagged dependent variables as explanatory variables in a feedforward network to capture dynamics. This approach suffers the drawback that the correct number of lags needed is typically unknown (this is analogous to the problem of determining the order of an autoregression). Hence. the lagged dependent variables in a network may not be enough to characterize the behaviour of y in some applications. To overcome this deficiency, various recurrent networks, i.e. networks with feedbacks, have been proposed. A recurrent network has a richer dynamic structure and is similar to a linear time- series model with moving average terms. In particular, we consider the following network due to Elman (1990) (see Figure 2): </p><p>n q </p><p>hi,t = yio + E lYiji, + E A1ih.,- j=l 1=1 </p><p>(x,, h, , 0), i= 1, ..., q (4) </p><p>o, = ( + Afi,(x,, h, , 0)) i '=1 / </p><p>pq(t,, h,_ 1, 0) </p><p>where 0 denotes the vector of parameters containing all /'s, y's, and 6's, and the subscript q of 0 again signifies the number of hidden units. Here, the hidden-unit activations hi feed back </p><p>ot </p><p>Output Layer </p><p>Activation Function </p><p>Feedback with Delay </p><p>hi,t ( . h2,t / Hidden Layer </p><p>Activation Function </p><p>lt 1 ,_ I Input Layer IT iT IT ^"^i.i~~~hl~-i ' 2,- </p><p>Figure 2. A simple Elman (1990) network with hidden-unit activations feedback </p><p>350 </p></li><li><p>FORECASTING WITH NEURAL NETWORKS </p><p>to the input layer with delay and serve to 'memorize' the past information (cf. equation (1)). From equation (4) we can write, by recursive substitution, </p><p>hi,,= ip(x,,ip(x,_, h,2, ..., 0), 0)= =.=:ri(x', 0) i=1,..., q (5) </p><p>where x' = (x,,x, x,...,xi), and ip is vector-valued with ipi as its ith element. Hence, hj, depends on x, and its entire history. It follows that </p><p>o, = (q(X,, h,_1, 0)- gq(X, 0) (6) </p><p>is also a function of x, and its history (cf. equation (2)). In view of equation (6), a recurrent network may capture more dynamic characteristics of y, than does a feedforward network. In the L2 context, a recurrent network may be interpreted as an approximation of E(y, Ix'). To ensure proper behaviour of the Elman (1990) network, Kuan and White (1994b) show that, aside from some regularity conditions on the data y and x and some smoothness conditions (such as continuous differentiability) on m and Y, the hidden unit activation function v must also be a contraction mapping in h, _; otherwise, hi, will approach its upper or lower bound very quickly when t is a bounded function or will explod whhen v is an unbounded function. Kuan et al. (1994) show that a sufficient condition assuring the contraction mapping property is 6i, </p></li><li><p>C.-M. KUAN AND T. LIU </p><p>equivalent to the NLS method. Although recursive estimates are not as efficient as NLS estimates in finite samples, they are useful when on-line information processing is important. Moreover, recursive methods can facilitate network selection, as discussed in the subsection below. White (1989) also suggests that one can perform recursive estimation up to certain time point and then apply a NLS technique to improve efficiency of estimates. </p><p>Similarly, the parameters of interest in a recurrent network q5 are </p><p>0 = argmin lim E y, - 0,(x,, h,_ , 0)12 t -&gt;00 </p><p>where limit is taken to accommodate the effects of network feedbacks h,_l (Kuan and White, 1994b). The estimates of 0* can also be obtained using nonrecursive or recursive methods. In view of equation (5) and (6), h, and o, depend on 0 directly and also indirectly through the presence of lagged hidden-unit activations h,_l. Thus, in calculating the derivatives of S q with respect to 0, parameter dependence of h,_l must be taken into account to ensure proper search direction. Owing to this parameter-dependent structure and the constraints required for 6's (discussed in Section 2), NLS optimization techniques involving analytic derivatives are difficult to implement. Our experience shows that NLS estimation using numerical derivatives usually suffers the problem of a singular information matrix. Alternatively, one could use a recursive estimation method such as the 'recurrent Newton algorithm' of Kuan (1994), which is analogous to that of Kuan and White (1994a) for feedforward networks. This algorithm is also root-t consistent for Oq (see e.g. Benveniste et al., 1990). Kuan (1994) also shows that it is more efficient than the recurrent BP algorithm of Kuan et al. (1994). The recursive Newton algorithms for feedforward and recurrent networks used in our applications are described in the Appendix. </p><p>3.2. Complexity regular...</p></li></ul>