optimizing the pairs-trading strategy using deep...

21
Research Article Optimizing the Pairs-Trading Strategy Using Deep Reinforcement Learning with Trading and Stop-Loss Boundaries Taewook Kim 1,2 and Ha Young Kim 3 Qra Technologies, Inc., Ttukseom-ro -gil, Sungdong-gu, Seoul , Republic of Korea Department of Financial Engineering, Ajou University, Worldcupro , Yeongtong-gu, Suwon , Republic of Korea Graduate School of Information, Yonsei University, Yonsei-ro, Seodaemun-gu, Seoul, , Republic of Korea Correspondence should be addressed to Ha Young Kim; [email protected] Received 6 February 2019; Revised 14 April 2019; Accepted 11 June 2019; Published 12 November 2019 Guest Editor: Benjamin M. Tabak Copyright © 2019 Taewook Kim and Ha Young Kim. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Many researchers have tried to optimize pairs trading as the numbers of opportunities for arbitrage profit have gradually decreased. Pairs trading is a market-neutral strategy; it profits if the given condition is satisfied within a given trading window, and if not, there is a risk of loss. In this study, we propose an optimized pairs-trading strategy using deep reinforcement learning—particularly with the deep Q-network—utilizing various trading and stop-loss boundaries. More specifically, if spreads hit trading thresholds and reverse to the mean, the agent receives a positive reward. However, if spreads hit stop-loss thresholds or fail to reverse to the mean aſter hitting the trading thresholds, the agent receives a negative reward. e agent is trained to select the optimum level of discretized trading and stop-loss boundaries given a spread to maximize the expected sum of discounted future profits. Pairs are selected from stocks on the S&P 500 Index using a cointegration test. We compared our proposed method with traditional pairs-trading strategies which use constant trading and stop-loss boundaries. We find that our proposed model is trained well and outperforms traditional pairs-trading strategies. 1. Introduction Pairs trading is a method for obtaining arbitrage profit when there is a statistical difference between two stocks with similar characteristics that are cointegrated or highly correlated. is is possible because of the statistical reason that spreads made by two stocks have a mean reversion in the long run [1]. In the early days, pairs-trading methods were popular because of the opportunity to obtain arbitrage profit [1–4]. However, as many investors including hedge funds sought these arbitrage opportunities by executing the pairs-trading strategy, its profitability began to deteriorate [5, 6]. To overcome these shortcomings, significant research has been conducted to improve the pairs-trading strategy [7–10]. e mechanism of pairs trading is as follows. First, a pair of stocks with similar trends is identified. Second, regression analysis such as ordinary least squares (OLS), total least squares (TLS), and error correction models (ECM) is used to calculate the spread of these stocks. Finally, if the spread hits preset boundaries, investors will open a portfolio which takes a long position on the undervalued stock and shorts the overvalued stock. Subsequently, if the spread reverses to the mean, investors will close the portfolios which are opposite position to the open portfolio. In this case, the investor obtains an arbitrage profit by executing this strategy. However, there is a risk when the spread does not reverse to the mean. In such a situation, investors are at high risk because they cannot close the portfolio. By setting a stop-loss boundary, investors can hedge the risk [11–13]. Many researchers have applied various statistical methods to improve the efficiency and performance of pairs trading. In particular, they focused on using the spread as a trading signal. e study in [1] collected pairs of stocks based on minimizing the sum of squared deviations between the two stocks and then executed the trading strategy if the difference between the pairs is twice the standard deviation of the spread. ey used normalized US stock price data from 1962 to 2002 to test the profitability of pairs trading. e Hindawi Complexity Volume 2019, Article ID 3582516, 20 pages https://doi.org/10.1155/2019/3582516

Upload: others

Post on 25-Jul-2020

10 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Research ArticleOptimizing the Pairs-Trading Strategy Using DeepReinforcement Learning with Trading and Stop-Loss Boundaries

Taewook Kim 12 and Ha Young Kim 3

1Qra Technologies Inc Ttukseom-ro 1-gil Sungdong-gu Seoul 04778 Republic of Korea2Department of Financial Engineering Ajou University Worldcupro 206 Yeongtong-gu Suwon 16499 Republic of Korea3Graduate School of Information Yonsei University 50 Yonsei-ro Seodaemun-gu Seoul 03722 Republic of Korea

Correspondence should be addressed to Ha Young Kim haimkgetupgmailcom

Received 6 February 2019 Revised 14 April 2019 Accepted 11 June 2019 Published 12 November 2019

Guest Editor Benjamin M Tabak

Copyright copy 2019 Taewook Kim and Ha Young Kim This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Many researchers have tried to optimize pairs trading as the numbers of opportunities for arbitrage profit have gradually decreasedPairs trading is a market-neutral strategy it profits if the given condition is satisfied within a given trading window and if notthere is a risk of loss In this study we propose an optimized pairs-trading strategy using deep reinforcement learningmdashparticularlywith the deep Q-networkmdashutilizing various trading and stop-loss boundaries More specifically if spreads hit trading thresholdsand reverse to the mean the agent receives a positive reward However if spreads hit stop-loss thresholds or fail to reverse to themean after hitting the trading thresholds the agent receives a negative reward The agent is trained to select the optimum levelof discretized trading and stop-loss boundaries given a spread to maximize the expected sum of discounted future profits Pairsare selected from stocks on the SampP 500 Index using a cointegration test We compared our proposed method with traditionalpairs-trading strategies which use constant trading and stop-loss boundaries We find that our proposed model is trained well andoutperforms traditional pairs-trading strategies

1 Introduction

Pairs trading is a method for obtaining arbitrage profit whenthere is a statistical difference between two stockswith similarcharacteristics that are cointegrated or highly correlatedThisis possible because of the statistical reason that spreads madeby two stocks have a mean reversion in the long run [1] Inthe early days pairs-tradingmethodswere popular because ofthe opportunity to obtain arbitrage profit [1ndash4] However asmany investors including hedge funds sought these arbitrageopportunities by executing the pairs-trading strategy itsprofitability began to deteriorate [5 6] To overcome theseshortcomings significant research has been conducted toimprove the pairs-trading strategy [7ndash10]

The mechanism of pairs trading is as follows First a pairof stocks with similar trends is identified Second regressionanalysis such as ordinary least squares (OLS) total leastsquares (TLS) and error correction models (ECM) is usedto calculate the spread of these stocks Finally if the spread

hits preset boundaries investors will open a portfolio whichtakes a long position on the undervalued stock and shortsthe overvalued stock Subsequently if the spread reversesto the mean investors will close the portfolios which areopposite position to the open portfolio In this case theinvestor obtains an arbitrage profit by executing this strategyHowever there is a risk when the spread does not reverseto the mean In such a situation investors are at high riskbecause they cannot close the portfolio By setting a stop-lossboundary investors can hedge the risk [11ndash13]

Many researchers have applied various statisticalmethodsto improve the efficiency and performance of pairs tradingIn particular they focused on using the spread as a tradingsignal The study in [1] collected pairs of stocks based onminimizing the sum of squared deviations between the twostocks and then executed the trading strategy if the differencebetween the pairs is twice the standard deviation of thespread They used normalized US stock price data from1962 to 2002 to test the profitability of pairs trading The

HindawiComplexityVolume 2019 Article ID 3582516 20 pageshttpsdoiorg10115520193582516

2 Complexity

study in [14] used the cointegration approach to protect thepairs-trading strategy from severe losses They applied anOLS method to create a spread and set various conditionsthat translated into trading actions From these models theyachieved a trading strategy with a minimum level of profitsprotected from risk of loss The results showed about an11 annualized excess return over the entire period Theresearch in [15] compared the distance and cointegrationapproaches for each high-frequency and daily dataset tocheck whether it is profitable for Norwegian seafood com-panies The performance is similar between two approachesReference [16] used a Kalman filter to calculate spread whichwas then used as a high-frequency trading signal on theshares constituting the KOSPI 100 Index He found that thepairs-trading strategyrsquos performance was significant on theKOSPI and was better during daily market conditions atmarket opening and closing Moreover [7] optimized a pairs-trading system as a stochastic control problemThey used theOrnstein-Uhlenbeck process to calculate spread as a tradingsignal and tested their model with simulated data the resultsshowed that their strategy performs well In addition [17]suggested the Ornstein-Uhlenbeck process to make a marketmicrostructure noise used as a trading signal in pairs tradingstrategy The performance is better under this method thanin traditional estimators such as ARIMA(11) and maximumlikelihood Reference [18] applied a cointegration methodto Chinese commodity futures from 2006 to 2016 to checkwhether pairs trading was suitable in that market They usedOLS regression to create spreads from the pairs Furthermore[10] applied a cointegration test to assorted pairs of stocks anda vector error-correction model to create a trading signal

It is important to set a boundary to optimize the pairs-trading strategy This boundary is a criterion for decidingwhether to execute a pairs-trading strategy If a low boundaryis set many strategies will be executed but profits will belower if a high boundary is set investors will get high returnswhen the strategy is executed However all this assumesthat mean reversion occurs If the spread does not returnto the average in the specified trading window losses willbe incurred If a low boundary is set the loss will be smallHowever if the strategy is executed with a high boundary theloss will increase Therefore the performance of pair tradingdepends on how the boundary is set Reference [14] suggestedtaking a minimum-profit condition which could be efficientto reduce losses in a pairs-trading system They set a tradingrule with a diverse open condition for example if the spreadis above 03 05 075 10 and 15 standard deviations Theyused the daily closing prices from January 2 2001 to August30 2002 of two stocks the Australia New Zealand Bankand the Adelaide Bank The results showed that as the opencondition value decreases the number of trades and profitsincreases Also [19] suggested optimal preset boundariescalculated from estimated parameters for the average tradeduration intertrade interval and number of trades and usedthem to maximize the minimum total profit They used thedaily closing price data from January 2 2004 to June 30 2005of seven pairs of stocks on the Australian Stock ExchangeThe results showed that their proposed method was efficientin making profits using the pairs-trading strategy Reference

[18] examined whether the pairs-trading strategy could beapplied to the daily return of Chinese commodity futuresfrom 2006 to 2016 using three methods classical closed-loop and dynamic stop-loss The closed-loop method takesonly a stop-profit barrier which executes the strategy anddoes not consider the risk if spreads revert to the mean Theclassical method adds stop-loss boundaries to the closed-loop method The dynamic stop-loss method uses a varietyof stop-profit and stop-loss barriers to fit the spreads if thespread is larger than the standard deviation which is set usingcriteria based on the historical average of spreads The resultsshowed that these methods obtained an annualized return ofover 15 especially the closed-loop method which yieldedthe highest profit of 2694 In addition [20] experimentedwith fixed optimal threshold selection conditional volatilitypercentile spectral analysis and neural network thresholds inpairs-trading strategy Of these the neural network thresholdhas outperformed all other strategies

Following the success of reinforcement learning demon-strated by its successful performance at Atari games [21]many researchers have attempted to apply this algorithmto the financial trading system Reference [22] proposed adeep Q-trading system using reinforcement learning meth-ods They applied Q-learning to a trading system to tradeautomatically They set a delta price using data from the past120 days had three discrete action spaces (buy hold andsell) and used long-term profit as a reward They used dailydata from January 01 2001 to December 31 2015 of theHang Seng Index and the SampP 500 Index The experimentalresults showed that their proposed method outperformedbuy-and-hold strategies and recurrent reinforcement learn-ing methods Reference [23] proposed three steps to applyreinforcement learning to the financial trading system Firstthey reduced relative replay size to fit financial tradingSecond they proposed an action-augmentation techniquethat provides more feedback from the action to the agentThird they used long sequences as reinforcement data toconduct recurrent neural network trainingThe experimentaldata comprised tick-by-tick data of 12 forex currency pairsfrom January 2012 to December 2017The results showed thatthe action-augmentation technique yielded more profit thanan epsilon-greedy policy Reference [10] used an N-armedbandit problem to optimize the pairs-trading strategy Theytook the spread using an error-correction model and foundthe parameters using a grid-search algorithmThey comparedtheir proposed model with a constant parameter modelwhich was similar to a traditional pairs-trading strategyTheyused intraday one-minute data of some stocks in the FactSetdatabase from June 2015 to January 2016The performance oftheir proposedmodel was better than the constant-parametermodel

We investigate not only the dynamic boundary basedon a spread in each trading windowmdashwhich can achievehigher profit than the fixed boundary used in traditionalpairs trading strategymdashbut also if it is possible to train deepreinforcement learning methods to follow this mechanismTo this end we propose a new method to optimize thepairs trading strategy using deep reinforcement learningespecially deep Q-networks since pairs trading strategy can

Complexity 3

be thought of as a game After opening a portfolio positionthe profit can be set whether portfolio is closed stop-lossposition Therefore if we set this strategy as a game bysetting boundaries which are optimized in spreads in tradingwindow we can achieve more profit than traditional pairstrading strategies In particular we set the pairs-trading sys-tem to be a kind of game and obtain the optimal boundariestrading thresholds and stop-loss thresholds according to thecalculated spread The reason for this construction is that ifthe portfolio is opened and closed in the trading window inthe calculated spread it will be unconditionally profitable ifthe portfolio is closed If the portfolio reaches the stop-lossboundary or does not converge to themean lossesmay occurWe therefore set the DQN to learn by positively rewarding itif it takes a closed position and negatively rewarding it if itreaches the stop-loss or exit thresholds We conducted thefollowing experiments to verify that our proposed methodis optimized compared to the conventional method Firstwe used different spreads calculated using OLS and TLS tosee how the results differ depending on the spread usedfor input Second depending on the formation window andtrading window the spread and hedge ratio will be variedWe therefore set a total of six window sizes for selecting theoptimalwindow sizewhich had the best performance Finallywe compared the proposedmethodwith the traditional pairs-trading strategy using the test data with the optimal windowsize In this experiment we use the daily adjusted closingprices from January 2 1990 to July 31 2018 of 50 stocksin the SampP 500 Index Experimental results show that ourproposed method outperforms the traditional pairs-tradingstrategy across all the pairs In addition we can confirm thatthe performance measure varies according to the spread

The main contributions of this study are as follows Firstwe propose a novel method to optimize pairs trading strat-egy using deep reinforcement learning especially deep Q-networks with trading and stop-loss boundaries The exper-imental results show that our method can be applied in thepairs trading system and also to various other fields includingfinance and economics when there is a need to optimize arule-based strategy to be more efficient Second we proposean optimized dynamic boundary based on a spread ineach trading window Our proposed method outperformstraditional pairs trading strategy which set a fixed boundaryLast we find that our method outperforms traditional pairstrading strategy in all pairs based on constituent stocks inSampP 500 Since our method selects optimal boundaries basedon spreads it can be applied to other stock markets such asKOSPI Nikkei and Hang Seng It should be noted that thepresent work is a part of the Master thesis [24]

The rest of this paper is organized as follows Section 2explains the technical background Section 3 describes thematerials and methods Section 4 shows the results andprovides a discussion of the experiments Section 5 providesour conclusions to this study

2 Technical Background

21 13e Traditional Pairs-Trading Strategy Pairs tradingis a representative market-neutral trading strategy which

simultaneously longs an undervalued stock and shorts anovervalued stockThis strategy is a formof statistical arbitragetrading that assumes the movements of the prices of thetwo assets will be similar to previous trends [1] It followsthe assumption that asset prices will return to the long-termequilibriumThis strategy started from the idea that arbitrageopportunities exist when the price gap between two assetsexpands to or past a certain level It is also based on the beliefthat historical price movements will not change significantlyin the future

In Figure 1 the graph drawn in blue is a spread made oftwo stocks that are cointegrated the red lines are the tradingboundaries and the green lines are the stop-loss boundariesWhen this spread reaches the trading boundaries the port-folio is opened and only closed when the spread returns tothe average However losses are incurred when prices reachthe stop-loss boundaries after the portfolio is opened and donot return to the average Furthermore after the portfolio isopened if the trading signal is not reversed to mean duringthe trading window the portfolio is closed by force this iscalled the exit position of the portfolio

211 13e Cointegration Test There are many approachesfor pair selection such as the discrete approach [11 25ndash27]the cointegration approach [10 16 27] and the stochasticapproach [7 8] In this study we use the cointegrationapproach to choose pairs which have long-term equilibriumGenerally a linear combination of nonstationary variables isalso a nonstationary relationship Assume that 119909119905 and 119910119905 haveunit roots as previously mentioned the linear combinationof these variables follows nonstationary conditions

119909119905 sim 119868 (1) 119910119905 sim 119868 (1) (1)

119910119905 = 120572 + 120573119909119905 + 120576119905 (2)

However it can be a stationary relationship if the nonsta-tionary variables are cointegrated In this case this regressionmust be checked to determine whether it is a spuriousregression or cointegrated Johansenrsquos method is widely usedto test for cointegration [28] In this method the numberof cointegration relations and the parameters of the modelare estimated and tested using maximum likelihood estima-tion (MLE) Since all variables are regarded as endogenousvariables there is no need to select dependent variablesand multiple cointegration relationships are identified Inaddition we use MLE to estimate the cointegration relationwith the vector autoregression model and to determinethe cointegration coefficient based on the likelihood-ratiotest There is therefore an advantage in performing varioushypothesis tests related to the estimation of cointegrationparameters and the setting of other models when there iscointegration and not merely to test for cointegration

22 Spread Calculation

221 Ordinary Least Squares In regression analysis OLS iswidely used to estimate parameters by minimizing the sum

4 Complexity

Z-sc

ore

trading signaltrading boundarystop-loss boundary

Year

8

6

4

2

0

minus2

minus4

minus6

minus82009200720052003200119991997199519931991

Figure 1 The traditional pairs-trading strategy

of the squared errors [29] Assume that 119909119894 119910119894 and 120576119894 arean independent variable a dependent variable and an errorterm We can estimate 120573 from the following equation bytaking a partial derivative

119910119894 = 120573119909119894 + 120576119894 sim 119873(0 1205902120576 ) (3)

119899sum119894=1

(119910119894 minus 120573119909119894)2 (4)

120573 = ( 119899sum119894=1

1199091198941015840119909119894)minus1 119899sum119894=1

1199091198941015840119910119894 (5)

The value obtained from equation (5) is used for the numberof stock orders The epsilon value is also used as a tradingsignal through Z-scoring in the state composed of theformation-window size

222 Total Least Squares TLS estimates parameters to min-imize the sum of the measured distance and the verticaldistance between regression lines [30] Since the verticaldistance does not change when the X and Y coordinates arechanged the value of 120573 is calculated consistently In the TLSmethod the observed values of 119883119894 and 119884119894 have the followingerror terms

119884119894 = 119910119894 + 119890119894 sim 119873(0 1205902119890 ) (6)

119883119894 = 119909119894 + 119906119894 sim 119873(0 1205902119906) (7)

where 119909119894 and 119910119894 are true values and 119890119894 and 119906119894 are errorterms following independent identical distributions It isassumed that there is linear combination of true values Forconvenience we represent the error variance ratio in equation(10)

119910119894 = 1205730 + 1205731119909119894 (8)

119884119894 = 1205730 + 1205731119909119894 + 119890119894 sim 119873(0 1205902119890 ) (9)

120591 = var (119884119894 | 119909119894)var (119883119894 | 119909119894) = 12059021198901205902119906 (10)

The orthogonal regression estimator is calculated by mini-mizing the sum of the measured distance and the verticaldistance between regression lines in equation (11)

119899sum119894=1

(119884119894 minus 1205730 + 1205731119909119894)2120591 + (119883119894 minus 119909119894)2 (11)

1205731 = 1199042119884119884 minus 1205911199042119883119883 + (1199042119884119884 minus 1205911199042119883119883)2 + 41205911199042119883119884122119904119883119884

(12)

The value obtained from equation (12) is used in the sameway as that obtained from equation (5) and the epsilon valueis also used as a trading signal through the Z-score in the statecomposed of the formation-window size

23 Reinforcement Learning and the Deep Q-Network Theidea of reinforcement learning is to find an optimal policywhich maximizes the expected sum of discounted futurerewards [31] These rewards come from selecting the optimalvalue of each action called the optimal Q-value Rein-forcement learning basically solves the problem defined bythe Markov decision process (MDP) It consists of a tuple(119878 119860 119875 119877 120574) where 119878 is a finite set of states 119860 is a finite setof actions 119875 is a state transition probability matrix 119877 is areward function and 120574 is a discount factor In environment120576 agent-observed state 119904119905 at time 119905 action 119886119905 is selectedFrom the results of these sequences environmental feedbackis provided to the agent in the form of reward 119903119905 and nextstate 119904119905+1 An action is selected by the action-value function119876120587(119904 119886) that represents the expected sum of discountedfuture rewards

119876120587 (119904119905 119886119905) = E120587 [ 119879sum119894=119905

120574119894minus119905119903119894 | 119904119905 119886119905 120587] (13)

In this action-value function 119876120587(119904119905 119886119905) we find an optimalaction-value function 119876lowast(119904119905 119886119905) following an optimal policy

Complexity 5

which maximizes the expected sum of discounted futurerewards

119876lowast (119904119905 119886119905) = max120587

119876120587 (119904119905 119886119905) (14)

This optimal action-value function can be formulated as theBellman equation

119876lowast (119904119905 119886119905) = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1)] (15)

TheDQNuses a nonlinear function approximator to estimatethe action value function This network is trained by min-imizing a sequence of loss functions 119871 119905(120579119905) which changeswith each sequence of 119905 The weight of 120579119905 is updated as thesequence progresses

119871 119905 (120579119905) = E(119904119886)sim120588(∙) [(119910119905 minus 119876 (119904119905 119886119905 120579119905))2] (16)

119910119905 = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1 120579119905minus1) | 119904119905 119886119905] (17)

3 Materials and Methods

31 Data In this study 50 stocks from the SampP 500 Indexwere selected based on their trading volume and marketcapitalization To carry out the experiment the data mustcover the same period Therefore corresponding stocks wereselected leaving a total of 25 stocks Table 1 represents thedataset of stock names abbreviations of those stocks andtheir respective sectors We collected the adjusted dailyclosing prices using Thomson Reutersrsquo database The periodof the training dataset is from January 2 1990 to December31 2008 comprising 4792 data points the test dataset coversthe period from January 2 2009 to July 31 2018 comprising2411 data points From these datasets a pair of stocks willbe selected during the training dataset period using thecointegration test

32 Selecting PairsUsing theCointegrationTest It is necessaryto pair stocks which have long-run statistical relationshipsor similar price movements It is possible to determine thedegree to which two stocks have had similar pricemovementsthrough the correlation value Furthermore the long-termequilibrium of a pair of stocks is an important characteristicfor the execution of pairs trading In this study we usedthe cointegration approach to select pairs of stocks ThroughJohansenrsquos method we selected 11 pairs of stocks that havelong-run equilibria Table 2 shows the resulting pairs of stocksthat were identified based on t-statistics and Figure 2 showsprice movements of the cointegrated stocks XOM and CVXUsing this dataset we will verify whether our proposedmethod has better performance than the traditional pairs-trading method

33 Trading Signal After selecting the pairs it is necessaryto extract the signal for trading To extract signals we optfor the OLS or TLS methods First because the stock pricefollows a random walk [32] we need to ensure that it followsthe 119868(1) process through the augmented Dickey-Fuller testSubsequently the 119868(0) process should be created using the

logarithmic difference in stock prices which is then applied tothe OLS and TLS methods In equation (18) 1205721 is a constantvalue 1205731 is a hedge ratio (which is used as trading size) 120576119905is the error term and log119875119860119905 and log119875119861119905 are the logarithmicdifferences in the stock prices 119860 and 119861 at time 119905 We convertvalues of 120576119905 into a Z-score used as a trading signal Forexample if the trading signal reaches the threshold we shortone share of the overvalued stock (represented as log119875119860119905)and long 1205731 shares of the undervalued stock (representedas log119875119861119905) The hedge ratio is determined based on thewindow size We set a total of six discrete window sizes toobtain the optimal window size for the experiment Tradingwindows are constituted using half of the formation-windowsizeThe spread obtained here is used as a state when applyingreinforcement learning (ie as an input of the DQN)

log119875119861119905 = 1205721 + 1205731 log119875119860119905 + 120576119905 (18)

34 Proposed Method Optimized Pairs-Trading Strategy UsingtheDQNMethod In this study we optimize the pairs-tradingstrategy with a type of game using the DQN We will attemptto implement an optimal pairs-trading strategy by takingoptimal trading and stop-loss boundaries that correspond tothe given spread since performance depends on how tradingand stop-loss boundaries are set in pairs trading [14] Figure 3shows themechanism of our proposed pairs-trading strategyThroughout the cointegration test we identify pairs andusing regression analysis obtain a hedge ratio used as tradingvolume and a spread used as a trading signal and state In thecase of the DQN two hidden layers are set up and the numberof neurons is optimized by taking half of input size throughtrial and error Action values consist of the six discrete spacesin Table 3 Each value of 119886119905 has values for trading and stop-loss boundaries

A pairs-trading system can make a profit if the spreadtouches the threshold and returns to the average such that theportfolio is closed in each tradingwindowOn the other handif the trading boundary is touched and the stop-loss boundaryis reached the system tries to minimize losses by stoppingtrades If the spread touches the trading boundary but fails toreturn to the average the strategy may end up with a profitor a loss In this study the pairs-trading strategy is thereforeconsidered as a kind of game closing a portfolio yields a posi-tive reward and a portfolio that reaches its stop-loss thresholdyields a negative reward Although an exited portfolio maypossibly generate a positive profit there is also a possibilitythat losses will occur and it is therefore set to yield a negativerewardWe set the other conditions (such as the maintenanceof the portfolio or not to execute the portfolio) to zero so asto concentrate on the close stop-loss and exit positions

119882119905 =100381610038161003816100381610038161003816100381610038161003816V119860119905 times

1198781198601199051015840 minus 119878119860119905119878119860119905 + V119861119905 times 1198781198611199051015840 minus 119878119861119905119878119861119905100381610038161003816100381610038161003816100381610038161003816 119905 lt 1199051015840 (19)

119877119905

=

1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119888119897119900119904119890119889minus1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119903119890119886119888ℎ119890119904 119904119905119900119901 minus 119897119900119904119904 119905ℎ119903119890119904ℎ119900119897119889minus500 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119890119909119894119905119890119889

(20)

6 Complexity

Table 1 The 25 stocks on the SampP 500 Index used in this study

No Ticker Stock Sector1 AAPL Apple Inc Technology2 MSFT Microsoft Corporation Technology3 BRKa Berkshire Hathaway Inc Financial Services4 JPM JPMorgan Chase amp Co Financial Services5 JNJ Johnson amp Johnson Healthcare6 XOM Exxon Mobil Corporation Energy7 BAC Bank of America Corporation Financial Services8 WFC Wells Fargo amp Company Financial Services9 WMT Walmart Inc Consumer Defensive10 UNH UnitedHealth Group Incorporated Healthcare11 CVX Chevron Corporation Energy12 T ATampT Inc Communication Services13 PFE Pfizer Inc Healthcare14 ADBE Adobe Systems Incorporated Technology15 MCD McDonaldrsquos Corporation Consumer Cyclical16 MDT Medtronic plc Healthcare17 MMM 3M Company Industrials18 HON Honeywell International Inc Industrials19 GE General Electric Company Industrials20 ABT Abbott Laboratories Healthcare21 MO Altria Group Inc Consumer Defensive22 UNP Union Pacific Corporation Industrials23 TXN Texas Instruments Incorporated Technology24 UTX United Technologies Corporation Industrials25 LLY Eli Lilly and Company Healthcare

Year

Pric

e

20172013200920052001199719931989

140

120

100

80

60

40

20

XOMCVX

Figure 2 Cointegrated stock price movements

We fix the values of portfolio close stop-loss and exitto +1000 minus1000 and minus500 respectively When we updatethe Q-values we must consider the reward as a significantcomponent of efficiently training the DQN We therefore setthe reward value to have a range similar to that of the Q-value Additionally we included the corresponding profit orloss value to reflect that weight after the trading ended Inequation (19) V119860119905 and V119861119905 are the stock orders of stocks119860 and

119861 at time 119905 119878119860119905 and 119878119861119905 are the stock prices of119860 and 119861 at time119905 and 1198781198601199051015840 and 1198781198611199051015840 are the stock prices of119860 and 119861 at time 1199051015840Algorithm 1 shows the process of our proposed method

Beforewe start our proposedmethodwe set a replaymemoryand batch size and select pairs using the cointegration testAt each epoch we initialized total profit to 10 In thetraining scheme we set a state which has spreads withinthe formation window and select actions which are used as

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

2 Complexity

study in [14] used the cointegration approach to protect thepairs-trading strategy from severe losses They applied anOLS method to create a spread and set various conditionsthat translated into trading actions From these models theyachieved a trading strategy with a minimum level of profitsprotected from risk of loss The results showed about an11 annualized excess return over the entire period Theresearch in [15] compared the distance and cointegrationapproaches for each high-frequency and daily dataset tocheck whether it is profitable for Norwegian seafood com-panies The performance is similar between two approachesReference [16] used a Kalman filter to calculate spread whichwas then used as a high-frequency trading signal on theshares constituting the KOSPI 100 Index He found that thepairs-trading strategyrsquos performance was significant on theKOSPI and was better during daily market conditions atmarket opening and closing Moreover [7] optimized a pairs-trading system as a stochastic control problemThey used theOrnstein-Uhlenbeck process to calculate spread as a tradingsignal and tested their model with simulated data the resultsshowed that their strategy performs well In addition [17]suggested the Ornstein-Uhlenbeck process to make a marketmicrostructure noise used as a trading signal in pairs tradingstrategy The performance is better under this method thanin traditional estimators such as ARIMA(11) and maximumlikelihood Reference [18] applied a cointegration methodto Chinese commodity futures from 2006 to 2016 to checkwhether pairs trading was suitable in that market They usedOLS regression to create spreads from the pairs Furthermore[10] applied a cointegration test to assorted pairs of stocks anda vector error-correction model to create a trading signal

It is important to set a boundary to optimize the pairs-trading strategy This boundary is a criterion for decidingwhether to execute a pairs-trading strategy If a low boundaryis set many strategies will be executed but profits will belower if a high boundary is set investors will get high returnswhen the strategy is executed However all this assumesthat mean reversion occurs If the spread does not returnto the average in the specified trading window losses willbe incurred If a low boundary is set the loss will be smallHowever if the strategy is executed with a high boundary theloss will increase Therefore the performance of pair tradingdepends on how the boundary is set Reference [14] suggestedtaking a minimum-profit condition which could be efficientto reduce losses in a pairs-trading system They set a tradingrule with a diverse open condition for example if the spreadis above 03 05 075 10 and 15 standard deviations Theyused the daily closing prices from January 2 2001 to August30 2002 of two stocks the Australia New Zealand Bankand the Adelaide Bank The results showed that as the opencondition value decreases the number of trades and profitsincreases Also [19] suggested optimal preset boundariescalculated from estimated parameters for the average tradeduration intertrade interval and number of trades and usedthem to maximize the minimum total profit They used thedaily closing price data from January 2 2004 to June 30 2005of seven pairs of stocks on the Australian Stock ExchangeThe results showed that their proposed method was efficientin making profits using the pairs-trading strategy Reference

[18] examined whether the pairs-trading strategy could beapplied to the daily return of Chinese commodity futuresfrom 2006 to 2016 using three methods classical closed-loop and dynamic stop-loss The closed-loop method takesonly a stop-profit barrier which executes the strategy anddoes not consider the risk if spreads revert to the mean Theclassical method adds stop-loss boundaries to the closed-loop method The dynamic stop-loss method uses a varietyof stop-profit and stop-loss barriers to fit the spreads if thespread is larger than the standard deviation which is set usingcriteria based on the historical average of spreads The resultsshowed that these methods obtained an annualized return ofover 15 especially the closed-loop method which yieldedthe highest profit of 2694 In addition [20] experimentedwith fixed optimal threshold selection conditional volatilitypercentile spectral analysis and neural network thresholds inpairs-trading strategy Of these the neural network thresholdhas outperformed all other strategies

Following the success of reinforcement learning demon-strated by its successful performance at Atari games [21]many researchers have attempted to apply this algorithmto the financial trading system Reference [22] proposed adeep Q-trading system using reinforcement learning meth-ods They applied Q-learning to a trading system to tradeautomatically They set a delta price using data from the past120 days had three discrete action spaces (buy hold andsell) and used long-term profit as a reward They used dailydata from January 01 2001 to December 31 2015 of theHang Seng Index and the SampP 500 Index The experimentalresults showed that their proposed method outperformedbuy-and-hold strategies and recurrent reinforcement learn-ing methods Reference [23] proposed three steps to applyreinforcement learning to the financial trading system Firstthey reduced relative replay size to fit financial tradingSecond they proposed an action-augmentation techniquethat provides more feedback from the action to the agentThird they used long sequences as reinforcement data toconduct recurrent neural network trainingThe experimentaldata comprised tick-by-tick data of 12 forex currency pairsfrom January 2012 to December 2017The results showed thatthe action-augmentation technique yielded more profit thanan epsilon-greedy policy Reference [10] used an N-armedbandit problem to optimize the pairs-trading strategy Theytook the spread using an error-correction model and foundthe parameters using a grid-search algorithmThey comparedtheir proposed model with a constant parameter modelwhich was similar to a traditional pairs-trading strategyTheyused intraday one-minute data of some stocks in the FactSetdatabase from June 2015 to January 2016The performance oftheir proposedmodel was better than the constant-parametermodel

We investigate not only the dynamic boundary basedon a spread in each trading windowmdashwhich can achievehigher profit than the fixed boundary used in traditionalpairs trading strategymdashbut also if it is possible to train deepreinforcement learning methods to follow this mechanismTo this end we propose a new method to optimize thepairs trading strategy using deep reinforcement learningespecially deep Q-networks since pairs trading strategy can

Complexity 3

be thought of as a game After opening a portfolio positionthe profit can be set whether portfolio is closed stop-lossposition Therefore if we set this strategy as a game bysetting boundaries which are optimized in spreads in tradingwindow we can achieve more profit than traditional pairstrading strategies In particular we set the pairs-trading sys-tem to be a kind of game and obtain the optimal boundariestrading thresholds and stop-loss thresholds according to thecalculated spread The reason for this construction is that ifthe portfolio is opened and closed in the trading window inthe calculated spread it will be unconditionally profitable ifthe portfolio is closed If the portfolio reaches the stop-lossboundary or does not converge to themean lossesmay occurWe therefore set the DQN to learn by positively rewarding itif it takes a closed position and negatively rewarding it if itreaches the stop-loss or exit thresholds We conducted thefollowing experiments to verify that our proposed methodis optimized compared to the conventional method Firstwe used different spreads calculated using OLS and TLS tosee how the results differ depending on the spread usedfor input Second depending on the formation window andtrading window the spread and hedge ratio will be variedWe therefore set a total of six window sizes for selecting theoptimalwindow sizewhich had the best performance Finallywe compared the proposedmethodwith the traditional pairs-trading strategy using the test data with the optimal windowsize In this experiment we use the daily adjusted closingprices from January 2 1990 to July 31 2018 of 50 stocksin the SampP 500 Index Experimental results show that ourproposed method outperforms the traditional pairs-tradingstrategy across all the pairs In addition we can confirm thatthe performance measure varies according to the spread

The main contributions of this study are as follows Firstwe propose a novel method to optimize pairs trading strat-egy using deep reinforcement learning especially deep Q-networks with trading and stop-loss boundaries The exper-imental results show that our method can be applied in thepairs trading system and also to various other fields includingfinance and economics when there is a need to optimize arule-based strategy to be more efficient Second we proposean optimized dynamic boundary based on a spread ineach trading window Our proposed method outperformstraditional pairs trading strategy which set a fixed boundaryLast we find that our method outperforms traditional pairstrading strategy in all pairs based on constituent stocks inSampP 500 Since our method selects optimal boundaries basedon spreads it can be applied to other stock markets such asKOSPI Nikkei and Hang Seng It should be noted that thepresent work is a part of the Master thesis [24]

The rest of this paper is organized as follows Section 2explains the technical background Section 3 describes thematerials and methods Section 4 shows the results andprovides a discussion of the experiments Section 5 providesour conclusions to this study

2 Technical Background

21 13e Traditional Pairs-Trading Strategy Pairs tradingis a representative market-neutral trading strategy which

simultaneously longs an undervalued stock and shorts anovervalued stockThis strategy is a formof statistical arbitragetrading that assumes the movements of the prices of thetwo assets will be similar to previous trends [1] It followsthe assumption that asset prices will return to the long-termequilibriumThis strategy started from the idea that arbitrageopportunities exist when the price gap between two assetsexpands to or past a certain level It is also based on the beliefthat historical price movements will not change significantlyin the future

In Figure 1 the graph drawn in blue is a spread made oftwo stocks that are cointegrated the red lines are the tradingboundaries and the green lines are the stop-loss boundariesWhen this spread reaches the trading boundaries the port-folio is opened and only closed when the spread returns tothe average However losses are incurred when prices reachthe stop-loss boundaries after the portfolio is opened and donot return to the average Furthermore after the portfolio isopened if the trading signal is not reversed to mean duringthe trading window the portfolio is closed by force this iscalled the exit position of the portfolio

211 13e Cointegration Test There are many approachesfor pair selection such as the discrete approach [11 25ndash27]the cointegration approach [10 16 27] and the stochasticapproach [7 8] In this study we use the cointegrationapproach to choose pairs which have long-term equilibriumGenerally a linear combination of nonstationary variables isalso a nonstationary relationship Assume that 119909119905 and 119910119905 haveunit roots as previously mentioned the linear combinationof these variables follows nonstationary conditions

119909119905 sim 119868 (1) 119910119905 sim 119868 (1) (1)

119910119905 = 120572 + 120573119909119905 + 120576119905 (2)

However it can be a stationary relationship if the nonsta-tionary variables are cointegrated In this case this regressionmust be checked to determine whether it is a spuriousregression or cointegrated Johansenrsquos method is widely usedto test for cointegration [28] In this method the numberof cointegration relations and the parameters of the modelare estimated and tested using maximum likelihood estima-tion (MLE) Since all variables are regarded as endogenousvariables there is no need to select dependent variablesand multiple cointegration relationships are identified Inaddition we use MLE to estimate the cointegration relationwith the vector autoregression model and to determinethe cointegration coefficient based on the likelihood-ratiotest There is therefore an advantage in performing varioushypothesis tests related to the estimation of cointegrationparameters and the setting of other models when there iscointegration and not merely to test for cointegration

22 Spread Calculation

221 Ordinary Least Squares In regression analysis OLS iswidely used to estimate parameters by minimizing the sum

4 Complexity

Z-sc

ore

trading signaltrading boundarystop-loss boundary

Year

8

6

4

2

0

minus2

minus4

minus6

minus82009200720052003200119991997199519931991

Figure 1 The traditional pairs-trading strategy

of the squared errors [29] Assume that 119909119894 119910119894 and 120576119894 arean independent variable a dependent variable and an errorterm We can estimate 120573 from the following equation bytaking a partial derivative

119910119894 = 120573119909119894 + 120576119894 sim 119873(0 1205902120576 ) (3)

119899sum119894=1

(119910119894 minus 120573119909119894)2 (4)

120573 = ( 119899sum119894=1

1199091198941015840119909119894)minus1 119899sum119894=1

1199091198941015840119910119894 (5)

The value obtained from equation (5) is used for the numberof stock orders The epsilon value is also used as a tradingsignal through Z-scoring in the state composed of theformation-window size

222 Total Least Squares TLS estimates parameters to min-imize the sum of the measured distance and the verticaldistance between regression lines [30] Since the verticaldistance does not change when the X and Y coordinates arechanged the value of 120573 is calculated consistently In the TLSmethod the observed values of 119883119894 and 119884119894 have the followingerror terms

119884119894 = 119910119894 + 119890119894 sim 119873(0 1205902119890 ) (6)

119883119894 = 119909119894 + 119906119894 sim 119873(0 1205902119906) (7)

where 119909119894 and 119910119894 are true values and 119890119894 and 119906119894 are errorterms following independent identical distributions It isassumed that there is linear combination of true values Forconvenience we represent the error variance ratio in equation(10)

119910119894 = 1205730 + 1205731119909119894 (8)

119884119894 = 1205730 + 1205731119909119894 + 119890119894 sim 119873(0 1205902119890 ) (9)

120591 = var (119884119894 | 119909119894)var (119883119894 | 119909119894) = 12059021198901205902119906 (10)

The orthogonal regression estimator is calculated by mini-mizing the sum of the measured distance and the verticaldistance between regression lines in equation (11)

119899sum119894=1

(119884119894 minus 1205730 + 1205731119909119894)2120591 + (119883119894 minus 119909119894)2 (11)

1205731 = 1199042119884119884 minus 1205911199042119883119883 + (1199042119884119884 minus 1205911199042119883119883)2 + 41205911199042119883119884122119904119883119884

(12)

The value obtained from equation (12) is used in the sameway as that obtained from equation (5) and the epsilon valueis also used as a trading signal through the Z-score in the statecomposed of the formation-window size

23 Reinforcement Learning and the Deep Q-Network Theidea of reinforcement learning is to find an optimal policywhich maximizes the expected sum of discounted futurerewards [31] These rewards come from selecting the optimalvalue of each action called the optimal Q-value Rein-forcement learning basically solves the problem defined bythe Markov decision process (MDP) It consists of a tuple(119878 119860 119875 119877 120574) where 119878 is a finite set of states 119860 is a finite setof actions 119875 is a state transition probability matrix 119877 is areward function and 120574 is a discount factor In environment120576 agent-observed state 119904119905 at time 119905 action 119886119905 is selectedFrom the results of these sequences environmental feedbackis provided to the agent in the form of reward 119903119905 and nextstate 119904119905+1 An action is selected by the action-value function119876120587(119904 119886) that represents the expected sum of discountedfuture rewards

119876120587 (119904119905 119886119905) = E120587 [ 119879sum119894=119905

120574119894minus119905119903119894 | 119904119905 119886119905 120587] (13)

In this action-value function 119876120587(119904119905 119886119905) we find an optimalaction-value function 119876lowast(119904119905 119886119905) following an optimal policy

Complexity 5

which maximizes the expected sum of discounted futurerewards

119876lowast (119904119905 119886119905) = max120587

119876120587 (119904119905 119886119905) (14)

This optimal action-value function can be formulated as theBellman equation

119876lowast (119904119905 119886119905) = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1)] (15)

TheDQNuses a nonlinear function approximator to estimatethe action value function This network is trained by min-imizing a sequence of loss functions 119871 119905(120579119905) which changeswith each sequence of 119905 The weight of 120579119905 is updated as thesequence progresses

119871 119905 (120579119905) = E(119904119886)sim120588(∙) [(119910119905 minus 119876 (119904119905 119886119905 120579119905))2] (16)

119910119905 = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1 120579119905minus1) | 119904119905 119886119905] (17)

3 Materials and Methods

31 Data In this study 50 stocks from the SampP 500 Indexwere selected based on their trading volume and marketcapitalization To carry out the experiment the data mustcover the same period Therefore corresponding stocks wereselected leaving a total of 25 stocks Table 1 represents thedataset of stock names abbreviations of those stocks andtheir respective sectors We collected the adjusted dailyclosing prices using Thomson Reutersrsquo database The periodof the training dataset is from January 2 1990 to December31 2008 comprising 4792 data points the test dataset coversthe period from January 2 2009 to July 31 2018 comprising2411 data points From these datasets a pair of stocks willbe selected during the training dataset period using thecointegration test

32 Selecting PairsUsing theCointegrationTest It is necessaryto pair stocks which have long-run statistical relationshipsor similar price movements It is possible to determine thedegree to which two stocks have had similar pricemovementsthrough the correlation value Furthermore the long-termequilibrium of a pair of stocks is an important characteristicfor the execution of pairs trading In this study we usedthe cointegration approach to select pairs of stocks ThroughJohansenrsquos method we selected 11 pairs of stocks that havelong-run equilibria Table 2 shows the resulting pairs of stocksthat were identified based on t-statistics and Figure 2 showsprice movements of the cointegrated stocks XOM and CVXUsing this dataset we will verify whether our proposedmethod has better performance than the traditional pairs-trading method

33 Trading Signal After selecting the pairs it is necessaryto extract the signal for trading To extract signals we optfor the OLS or TLS methods First because the stock pricefollows a random walk [32] we need to ensure that it followsthe 119868(1) process through the augmented Dickey-Fuller testSubsequently the 119868(0) process should be created using the

logarithmic difference in stock prices which is then applied tothe OLS and TLS methods In equation (18) 1205721 is a constantvalue 1205731 is a hedge ratio (which is used as trading size) 120576119905is the error term and log119875119860119905 and log119875119861119905 are the logarithmicdifferences in the stock prices 119860 and 119861 at time 119905 We convertvalues of 120576119905 into a Z-score used as a trading signal Forexample if the trading signal reaches the threshold we shortone share of the overvalued stock (represented as log119875119860119905)and long 1205731 shares of the undervalued stock (representedas log119875119861119905) The hedge ratio is determined based on thewindow size We set a total of six discrete window sizes toobtain the optimal window size for the experiment Tradingwindows are constituted using half of the formation-windowsizeThe spread obtained here is used as a state when applyingreinforcement learning (ie as an input of the DQN)

log119875119861119905 = 1205721 + 1205731 log119875119860119905 + 120576119905 (18)

34 Proposed Method Optimized Pairs-Trading Strategy UsingtheDQNMethod In this study we optimize the pairs-tradingstrategy with a type of game using the DQN We will attemptto implement an optimal pairs-trading strategy by takingoptimal trading and stop-loss boundaries that correspond tothe given spread since performance depends on how tradingand stop-loss boundaries are set in pairs trading [14] Figure 3shows themechanism of our proposed pairs-trading strategyThroughout the cointegration test we identify pairs andusing regression analysis obtain a hedge ratio used as tradingvolume and a spread used as a trading signal and state In thecase of the DQN two hidden layers are set up and the numberof neurons is optimized by taking half of input size throughtrial and error Action values consist of the six discrete spacesin Table 3 Each value of 119886119905 has values for trading and stop-loss boundaries

A pairs-trading system can make a profit if the spreadtouches the threshold and returns to the average such that theportfolio is closed in each tradingwindowOn the other handif the trading boundary is touched and the stop-loss boundaryis reached the system tries to minimize losses by stoppingtrades If the spread touches the trading boundary but fails toreturn to the average the strategy may end up with a profitor a loss In this study the pairs-trading strategy is thereforeconsidered as a kind of game closing a portfolio yields a posi-tive reward and a portfolio that reaches its stop-loss thresholdyields a negative reward Although an exited portfolio maypossibly generate a positive profit there is also a possibilitythat losses will occur and it is therefore set to yield a negativerewardWe set the other conditions (such as the maintenanceof the portfolio or not to execute the portfolio) to zero so asto concentrate on the close stop-loss and exit positions

119882119905 =100381610038161003816100381610038161003816100381610038161003816V119860119905 times

1198781198601199051015840 minus 119878119860119905119878119860119905 + V119861119905 times 1198781198611199051015840 minus 119878119861119905119878119861119905100381610038161003816100381610038161003816100381610038161003816 119905 lt 1199051015840 (19)

119877119905

=

1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119888119897119900119904119890119889minus1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119903119890119886119888ℎ119890119904 119904119905119900119901 minus 119897119900119904119904 119905ℎ119903119890119904ℎ119900119897119889minus500 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119890119909119894119905119890119889

(20)

6 Complexity

Table 1 The 25 stocks on the SampP 500 Index used in this study

No Ticker Stock Sector1 AAPL Apple Inc Technology2 MSFT Microsoft Corporation Technology3 BRKa Berkshire Hathaway Inc Financial Services4 JPM JPMorgan Chase amp Co Financial Services5 JNJ Johnson amp Johnson Healthcare6 XOM Exxon Mobil Corporation Energy7 BAC Bank of America Corporation Financial Services8 WFC Wells Fargo amp Company Financial Services9 WMT Walmart Inc Consumer Defensive10 UNH UnitedHealth Group Incorporated Healthcare11 CVX Chevron Corporation Energy12 T ATampT Inc Communication Services13 PFE Pfizer Inc Healthcare14 ADBE Adobe Systems Incorporated Technology15 MCD McDonaldrsquos Corporation Consumer Cyclical16 MDT Medtronic plc Healthcare17 MMM 3M Company Industrials18 HON Honeywell International Inc Industrials19 GE General Electric Company Industrials20 ABT Abbott Laboratories Healthcare21 MO Altria Group Inc Consumer Defensive22 UNP Union Pacific Corporation Industrials23 TXN Texas Instruments Incorporated Technology24 UTX United Technologies Corporation Industrials25 LLY Eli Lilly and Company Healthcare

Year

Pric

e

20172013200920052001199719931989

140

120

100

80

60

40

20

XOMCVX

Figure 2 Cointegrated stock price movements

We fix the values of portfolio close stop-loss and exitto +1000 minus1000 and minus500 respectively When we updatethe Q-values we must consider the reward as a significantcomponent of efficiently training the DQN We therefore setthe reward value to have a range similar to that of the Q-value Additionally we included the corresponding profit orloss value to reflect that weight after the trading ended Inequation (19) V119860119905 and V119861119905 are the stock orders of stocks119860 and

119861 at time 119905 119878119860119905 and 119878119861119905 are the stock prices of119860 and 119861 at time119905 and 1198781198601199051015840 and 1198781198611199051015840 are the stock prices of119860 and 119861 at time 1199051015840Algorithm 1 shows the process of our proposed method

Beforewe start our proposedmethodwe set a replaymemoryand batch size and select pairs using the cointegration testAt each epoch we initialized total profit to 10 In thetraining scheme we set a state which has spreads withinthe formation window and select actions which are used as

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 3

be thought of as a game After opening a portfolio positionthe profit can be set whether portfolio is closed stop-lossposition Therefore if we set this strategy as a game bysetting boundaries which are optimized in spreads in tradingwindow we can achieve more profit than traditional pairstrading strategies In particular we set the pairs-trading sys-tem to be a kind of game and obtain the optimal boundariestrading thresholds and stop-loss thresholds according to thecalculated spread The reason for this construction is that ifthe portfolio is opened and closed in the trading window inthe calculated spread it will be unconditionally profitable ifthe portfolio is closed If the portfolio reaches the stop-lossboundary or does not converge to themean lossesmay occurWe therefore set the DQN to learn by positively rewarding itif it takes a closed position and negatively rewarding it if itreaches the stop-loss or exit thresholds We conducted thefollowing experiments to verify that our proposed methodis optimized compared to the conventional method Firstwe used different spreads calculated using OLS and TLS tosee how the results differ depending on the spread usedfor input Second depending on the formation window andtrading window the spread and hedge ratio will be variedWe therefore set a total of six window sizes for selecting theoptimalwindow sizewhich had the best performance Finallywe compared the proposedmethodwith the traditional pairs-trading strategy using the test data with the optimal windowsize In this experiment we use the daily adjusted closingprices from January 2 1990 to July 31 2018 of 50 stocksin the SampP 500 Index Experimental results show that ourproposed method outperforms the traditional pairs-tradingstrategy across all the pairs In addition we can confirm thatthe performance measure varies according to the spread

The main contributions of this study are as follows Firstwe propose a novel method to optimize pairs trading strat-egy using deep reinforcement learning especially deep Q-networks with trading and stop-loss boundaries The exper-imental results show that our method can be applied in thepairs trading system and also to various other fields includingfinance and economics when there is a need to optimize arule-based strategy to be more efficient Second we proposean optimized dynamic boundary based on a spread ineach trading window Our proposed method outperformstraditional pairs trading strategy which set a fixed boundaryLast we find that our method outperforms traditional pairstrading strategy in all pairs based on constituent stocks inSampP 500 Since our method selects optimal boundaries basedon spreads it can be applied to other stock markets such asKOSPI Nikkei and Hang Seng It should be noted that thepresent work is a part of the Master thesis [24]

The rest of this paper is organized as follows Section 2explains the technical background Section 3 describes thematerials and methods Section 4 shows the results andprovides a discussion of the experiments Section 5 providesour conclusions to this study

2 Technical Background

21 13e Traditional Pairs-Trading Strategy Pairs tradingis a representative market-neutral trading strategy which

simultaneously longs an undervalued stock and shorts anovervalued stockThis strategy is a formof statistical arbitragetrading that assumes the movements of the prices of thetwo assets will be similar to previous trends [1] It followsthe assumption that asset prices will return to the long-termequilibriumThis strategy started from the idea that arbitrageopportunities exist when the price gap between two assetsexpands to or past a certain level It is also based on the beliefthat historical price movements will not change significantlyin the future

In Figure 1 the graph drawn in blue is a spread made oftwo stocks that are cointegrated the red lines are the tradingboundaries and the green lines are the stop-loss boundariesWhen this spread reaches the trading boundaries the port-folio is opened and only closed when the spread returns tothe average However losses are incurred when prices reachthe stop-loss boundaries after the portfolio is opened and donot return to the average Furthermore after the portfolio isopened if the trading signal is not reversed to mean duringthe trading window the portfolio is closed by force this iscalled the exit position of the portfolio

211 13e Cointegration Test There are many approachesfor pair selection such as the discrete approach [11 25ndash27]the cointegration approach [10 16 27] and the stochasticapproach [7 8] In this study we use the cointegrationapproach to choose pairs which have long-term equilibriumGenerally a linear combination of nonstationary variables isalso a nonstationary relationship Assume that 119909119905 and 119910119905 haveunit roots as previously mentioned the linear combinationof these variables follows nonstationary conditions

119909119905 sim 119868 (1) 119910119905 sim 119868 (1) (1)

119910119905 = 120572 + 120573119909119905 + 120576119905 (2)

However it can be a stationary relationship if the nonsta-tionary variables are cointegrated In this case this regressionmust be checked to determine whether it is a spuriousregression or cointegrated Johansenrsquos method is widely usedto test for cointegration [28] In this method the numberof cointegration relations and the parameters of the modelare estimated and tested using maximum likelihood estima-tion (MLE) Since all variables are regarded as endogenousvariables there is no need to select dependent variablesand multiple cointegration relationships are identified Inaddition we use MLE to estimate the cointegration relationwith the vector autoregression model and to determinethe cointegration coefficient based on the likelihood-ratiotest There is therefore an advantage in performing varioushypothesis tests related to the estimation of cointegrationparameters and the setting of other models when there iscointegration and not merely to test for cointegration

22 Spread Calculation

221 Ordinary Least Squares In regression analysis OLS iswidely used to estimate parameters by minimizing the sum

4 Complexity

Z-sc

ore

trading signaltrading boundarystop-loss boundary

Year

8

6

4

2

0

minus2

minus4

minus6

minus82009200720052003200119991997199519931991

Figure 1 The traditional pairs-trading strategy

of the squared errors [29] Assume that 119909119894 119910119894 and 120576119894 arean independent variable a dependent variable and an errorterm We can estimate 120573 from the following equation bytaking a partial derivative

119910119894 = 120573119909119894 + 120576119894 sim 119873(0 1205902120576 ) (3)

119899sum119894=1

(119910119894 minus 120573119909119894)2 (4)

120573 = ( 119899sum119894=1

1199091198941015840119909119894)minus1 119899sum119894=1

1199091198941015840119910119894 (5)

The value obtained from equation (5) is used for the numberof stock orders The epsilon value is also used as a tradingsignal through Z-scoring in the state composed of theformation-window size

222 Total Least Squares TLS estimates parameters to min-imize the sum of the measured distance and the verticaldistance between regression lines [30] Since the verticaldistance does not change when the X and Y coordinates arechanged the value of 120573 is calculated consistently In the TLSmethod the observed values of 119883119894 and 119884119894 have the followingerror terms

119884119894 = 119910119894 + 119890119894 sim 119873(0 1205902119890 ) (6)

119883119894 = 119909119894 + 119906119894 sim 119873(0 1205902119906) (7)

where 119909119894 and 119910119894 are true values and 119890119894 and 119906119894 are errorterms following independent identical distributions It isassumed that there is linear combination of true values Forconvenience we represent the error variance ratio in equation(10)

119910119894 = 1205730 + 1205731119909119894 (8)

119884119894 = 1205730 + 1205731119909119894 + 119890119894 sim 119873(0 1205902119890 ) (9)

120591 = var (119884119894 | 119909119894)var (119883119894 | 119909119894) = 12059021198901205902119906 (10)

The orthogonal regression estimator is calculated by mini-mizing the sum of the measured distance and the verticaldistance between regression lines in equation (11)

119899sum119894=1

(119884119894 minus 1205730 + 1205731119909119894)2120591 + (119883119894 minus 119909119894)2 (11)

1205731 = 1199042119884119884 minus 1205911199042119883119883 + (1199042119884119884 minus 1205911199042119883119883)2 + 41205911199042119883119884122119904119883119884

(12)

The value obtained from equation (12) is used in the sameway as that obtained from equation (5) and the epsilon valueis also used as a trading signal through the Z-score in the statecomposed of the formation-window size

23 Reinforcement Learning and the Deep Q-Network Theidea of reinforcement learning is to find an optimal policywhich maximizes the expected sum of discounted futurerewards [31] These rewards come from selecting the optimalvalue of each action called the optimal Q-value Rein-forcement learning basically solves the problem defined bythe Markov decision process (MDP) It consists of a tuple(119878 119860 119875 119877 120574) where 119878 is a finite set of states 119860 is a finite setof actions 119875 is a state transition probability matrix 119877 is areward function and 120574 is a discount factor In environment120576 agent-observed state 119904119905 at time 119905 action 119886119905 is selectedFrom the results of these sequences environmental feedbackis provided to the agent in the form of reward 119903119905 and nextstate 119904119905+1 An action is selected by the action-value function119876120587(119904 119886) that represents the expected sum of discountedfuture rewards

119876120587 (119904119905 119886119905) = E120587 [ 119879sum119894=119905

120574119894minus119905119903119894 | 119904119905 119886119905 120587] (13)

In this action-value function 119876120587(119904119905 119886119905) we find an optimalaction-value function 119876lowast(119904119905 119886119905) following an optimal policy

Complexity 5

which maximizes the expected sum of discounted futurerewards

119876lowast (119904119905 119886119905) = max120587

119876120587 (119904119905 119886119905) (14)

This optimal action-value function can be formulated as theBellman equation

119876lowast (119904119905 119886119905) = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1)] (15)

TheDQNuses a nonlinear function approximator to estimatethe action value function This network is trained by min-imizing a sequence of loss functions 119871 119905(120579119905) which changeswith each sequence of 119905 The weight of 120579119905 is updated as thesequence progresses

119871 119905 (120579119905) = E(119904119886)sim120588(∙) [(119910119905 minus 119876 (119904119905 119886119905 120579119905))2] (16)

119910119905 = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1 120579119905minus1) | 119904119905 119886119905] (17)

3 Materials and Methods

31 Data In this study 50 stocks from the SampP 500 Indexwere selected based on their trading volume and marketcapitalization To carry out the experiment the data mustcover the same period Therefore corresponding stocks wereselected leaving a total of 25 stocks Table 1 represents thedataset of stock names abbreviations of those stocks andtheir respective sectors We collected the adjusted dailyclosing prices using Thomson Reutersrsquo database The periodof the training dataset is from January 2 1990 to December31 2008 comprising 4792 data points the test dataset coversthe period from January 2 2009 to July 31 2018 comprising2411 data points From these datasets a pair of stocks willbe selected during the training dataset period using thecointegration test

32 Selecting PairsUsing theCointegrationTest It is necessaryto pair stocks which have long-run statistical relationshipsor similar price movements It is possible to determine thedegree to which two stocks have had similar pricemovementsthrough the correlation value Furthermore the long-termequilibrium of a pair of stocks is an important characteristicfor the execution of pairs trading In this study we usedthe cointegration approach to select pairs of stocks ThroughJohansenrsquos method we selected 11 pairs of stocks that havelong-run equilibria Table 2 shows the resulting pairs of stocksthat were identified based on t-statistics and Figure 2 showsprice movements of the cointegrated stocks XOM and CVXUsing this dataset we will verify whether our proposedmethod has better performance than the traditional pairs-trading method

33 Trading Signal After selecting the pairs it is necessaryto extract the signal for trading To extract signals we optfor the OLS or TLS methods First because the stock pricefollows a random walk [32] we need to ensure that it followsthe 119868(1) process through the augmented Dickey-Fuller testSubsequently the 119868(0) process should be created using the

logarithmic difference in stock prices which is then applied tothe OLS and TLS methods In equation (18) 1205721 is a constantvalue 1205731 is a hedge ratio (which is used as trading size) 120576119905is the error term and log119875119860119905 and log119875119861119905 are the logarithmicdifferences in the stock prices 119860 and 119861 at time 119905 We convertvalues of 120576119905 into a Z-score used as a trading signal Forexample if the trading signal reaches the threshold we shortone share of the overvalued stock (represented as log119875119860119905)and long 1205731 shares of the undervalued stock (representedas log119875119861119905) The hedge ratio is determined based on thewindow size We set a total of six discrete window sizes toobtain the optimal window size for the experiment Tradingwindows are constituted using half of the formation-windowsizeThe spread obtained here is used as a state when applyingreinforcement learning (ie as an input of the DQN)

log119875119861119905 = 1205721 + 1205731 log119875119860119905 + 120576119905 (18)

34 Proposed Method Optimized Pairs-Trading Strategy UsingtheDQNMethod In this study we optimize the pairs-tradingstrategy with a type of game using the DQN We will attemptto implement an optimal pairs-trading strategy by takingoptimal trading and stop-loss boundaries that correspond tothe given spread since performance depends on how tradingand stop-loss boundaries are set in pairs trading [14] Figure 3shows themechanism of our proposed pairs-trading strategyThroughout the cointegration test we identify pairs andusing regression analysis obtain a hedge ratio used as tradingvolume and a spread used as a trading signal and state In thecase of the DQN two hidden layers are set up and the numberof neurons is optimized by taking half of input size throughtrial and error Action values consist of the six discrete spacesin Table 3 Each value of 119886119905 has values for trading and stop-loss boundaries

A pairs-trading system can make a profit if the spreadtouches the threshold and returns to the average such that theportfolio is closed in each tradingwindowOn the other handif the trading boundary is touched and the stop-loss boundaryis reached the system tries to minimize losses by stoppingtrades If the spread touches the trading boundary but fails toreturn to the average the strategy may end up with a profitor a loss In this study the pairs-trading strategy is thereforeconsidered as a kind of game closing a portfolio yields a posi-tive reward and a portfolio that reaches its stop-loss thresholdyields a negative reward Although an exited portfolio maypossibly generate a positive profit there is also a possibilitythat losses will occur and it is therefore set to yield a negativerewardWe set the other conditions (such as the maintenanceof the portfolio or not to execute the portfolio) to zero so asto concentrate on the close stop-loss and exit positions

119882119905 =100381610038161003816100381610038161003816100381610038161003816V119860119905 times

1198781198601199051015840 minus 119878119860119905119878119860119905 + V119861119905 times 1198781198611199051015840 minus 119878119861119905119878119861119905100381610038161003816100381610038161003816100381610038161003816 119905 lt 1199051015840 (19)

119877119905

=

1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119888119897119900119904119890119889minus1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119903119890119886119888ℎ119890119904 119904119905119900119901 minus 119897119900119904119904 119905ℎ119903119890119904ℎ119900119897119889minus500 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119890119909119894119905119890119889

(20)

6 Complexity

Table 1 The 25 stocks on the SampP 500 Index used in this study

No Ticker Stock Sector1 AAPL Apple Inc Technology2 MSFT Microsoft Corporation Technology3 BRKa Berkshire Hathaway Inc Financial Services4 JPM JPMorgan Chase amp Co Financial Services5 JNJ Johnson amp Johnson Healthcare6 XOM Exxon Mobil Corporation Energy7 BAC Bank of America Corporation Financial Services8 WFC Wells Fargo amp Company Financial Services9 WMT Walmart Inc Consumer Defensive10 UNH UnitedHealth Group Incorporated Healthcare11 CVX Chevron Corporation Energy12 T ATampT Inc Communication Services13 PFE Pfizer Inc Healthcare14 ADBE Adobe Systems Incorporated Technology15 MCD McDonaldrsquos Corporation Consumer Cyclical16 MDT Medtronic plc Healthcare17 MMM 3M Company Industrials18 HON Honeywell International Inc Industrials19 GE General Electric Company Industrials20 ABT Abbott Laboratories Healthcare21 MO Altria Group Inc Consumer Defensive22 UNP Union Pacific Corporation Industrials23 TXN Texas Instruments Incorporated Technology24 UTX United Technologies Corporation Industrials25 LLY Eli Lilly and Company Healthcare

Year

Pric

e

20172013200920052001199719931989

140

120

100

80

60

40

20

XOMCVX

Figure 2 Cointegrated stock price movements

We fix the values of portfolio close stop-loss and exitto +1000 minus1000 and minus500 respectively When we updatethe Q-values we must consider the reward as a significantcomponent of efficiently training the DQN We therefore setthe reward value to have a range similar to that of the Q-value Additionally we included the corresponding profit orloss value to reflect that weight after the trading ended Inequation (19) V119860119905 and V119861119905 are the stock orders of stocks119860 and

119861 at time 119905 119878119860119905 and 119878119861119905 are the stock prices of119860 and 119861 at time119905 and 1198781198601199051015840 and 1198781198611199051015840 are the stock prices of119860 and 119861 at time 1199051015840Algorithm 1 shows the process of our proposed method

Beforewe start our proposedmethodwe set a replaymemoryand batch size and select pairs using the cointegration testAt each epoch we initialized total profit to 10 In thetraining scheme we set a state which has spreads withinthe formation window and select actions which are used as

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

4 Complexity

Z-sc

ore

trading signaltrading boundarystop-loss boundary

Year

8

6

4

2

0

minus2

minus4

minus6

minus82009200720052003200119991997199519931991

Figure 1 The traditional pairs-trading strategy

of the squared errors [29] Assume that 119909119894 119910119894 and 120576119894 arean independent variable a dependent variable and an errorterm We can estimate 120573 from the following equation bytaking a partial derivative

119910119894 = 120573119909119894 + 120576119894 sim 119873(0 1205902120576 ) (3)

119899sum119894=1

(119910119894 minus 120573119909119894)2 (4)

120573 = ( 119899sum119894=1

1199091198941015840119909119894)minus1 119899sum119894=1

1199091198941015840119910119894 (5)

The value obtained from equation (5) is used for the numberof stock orders The epsilon value is also used as a tradingsignal through Z-scoring in the state composed of theformation-window size

222 Total Least Squares TLS estimates parameters to min-imize the sum of the measured distance and the verticaldistance between regression lines [30] Since the verticaldistance does not change when the X and Y coordinates arechanged the value of 120573 is calculated consistently In the TLSmethod the observed values of 119883119894 and 119884119894 have the followingerror terms

119884119894 = 119910119894 + 119890119894 sim 119873(0 1205902119890 ) (6)

119883119894 = 119909119894 + 119906119894 sim 119873(0 1205902119906) (7)

where 119909119894 and 119910119894 are true values and 119890119894 and 119906119894 are errorterms following independent identical distributions It isassumed that there is linear combination of true values Forconvenience we represent the error variance ratio in equation(10)

119910119894 = 1205730 + 1205731119909119894 (8)

119884119894 = 1205730 + 1205731119909119894 + 119890119894 sim 119873(0 1205902119890 ) (9)

120591 = var (119884119894 | 119909119894)var (119883119894 | 119909119894) = 12059021198901205902119906 (10)

The orthogonal regression estimator is calculated by mini-mizing the sum of the measured distance and the verticaldistance between regression lines in equation (11)

119899sum119894=1

(119884119894 minus 1205730 + 1205731119909119894)2120591 + (119883119894 minus 119909119894)2 (11)

1205731 = 1199042119884119884 minus 1205911199042119883119883 + (1199042119884119884 minus 1205911199042119883119883)2 + 41205911199042119883119884122119904119883119884

(12)

The value obtained from equation (12) is used in the sameway as that obtained from equation (5) and the epsilon valueis also used as a trading signal through the Z-score in the statecomposed of the formation-window size

23 Reinforcement Learning and the Deep Q-Network Theidea of reinforcement learning is to find an optimal policywhich maximizes the expected sum of discounted futurerewards [31] These rewards come from selecting the optimalvalue of each action called the optimal Q-value Rein-forcement learning basically solves the problem defined bythe Markov decision process (MDP) It consists of a tuple(119878 119860 119875 119877 120574) where 119878 is a finite set of states 119860 is a finite setof actions 119875 is a state transition probability matrix 119877 is areward function and 120574 is a discount factor In environment120576 agent-observed state 119904119905 at time 119905 action 119886119905 is selectedFrom the results of these sequences environmental feedbackis provided to the agent in the form of reward 119903119905 and nextstate 119904119905+1 An action is selected by the action-value function119876120587(119904 119886) that represents the expected sum of discountedfuture rewards

119876120587 (119904119905 119886119905) = E120587 [ 119879sum119894=119905

120574119894minus119905119903119894 | 119904119905 119886119905 120587] (13)

In this action-value function 119876120587(119904119905 119886119905) we find an optimalaction-value function 119876lowast(119904119905 119886119905) following an optimal policy

Complexity 5

which maximizes the expected sum of discounted futurerewards

119876lowast (119904119905 119886119905) = max120587

119876120587 (119904119905 119886119905) (14)

This optimal action-value function can be formulated as theBellman equation

119876lowast (119904119905 119886119905) = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1)] (15)

TheDQNuses a nonlinear function approximator to estimatethe action value function This network is trained by min-imizing a sequence of loss functions 119871 119905(120579119905) which changeswith each sequence of 119905 The weight of 120579119905 is updated as thesequence progresses

119871 119905 (120579119905) = E(119904119886)sim120588(∙) [(119910119905 minus 119876 (119904119905 119886119905 120579119905))2] (16)

119910119905 = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1 120579119905minus1) | 119904119905 119886119905] (17)

3 Materials and Methods

31 Data In this study 50 stocks from the SampP 500 Indexwere selected based on their trading volume and marketcapitalization To carry out the experiment the data mustcover the same period Therefore corresponding stocks wereselected leaving a total of 25 stocks Table 1 represents thedataset of stock names abbreviations of those stocks andtheir respective sectors We collected the adjusted dailyclosing prices using Thomson Reutersrsquo database The periodof the training dataset is from January 2 1990 to December31 2008 comprising 4792 data points the test dataset coversthe period from January 2 2009 to July 31 2018 comprising2411 data points From these datasets a pair of stocks willbe selected during the training dataset period using thecointegration test

32 Selecting PairsUsing theCointegrationTest It is necessaryto pair stocks which have long-run statistical relationshipsor similar price movements It is possible to determine thedegree to which two stocks have had similar pricemovementsthrough the correlation value Furthermore the long-termequilibrium of a pair of stocks is an important characteristicfor the execution of pairs trading In this study we usedthe cointegration approach to select pairs of stocks ThroughJohansenrsquos method we selected 11 pairs of stocks that havelong-run equilibria Table 2 shows the resulting pairs of stocksthat were identified based on t-statistics and Figure 2 showsprice movements of the cointegrated stocks XOM and CVXUsing this dataset we will verify whether our proposedmethod has better performance than the traditional pairs-trading method

33 Trading Signal After selecting the pairs it is necessaryto extract the signal for trading To extract signals we optfor the OLS or TLS methods First because the stock pricefollows a random walk [32] we need to ensure that it followsthe 119868(1) process through the augmented Dickey-Fuller testSubsequently the 119868(0) process should be created using the

logarithmic difference in stock prices which is then applied tothe OLS and TLS methods In equation (18) 1205721 is a constantvalue 1205731 is a hedge ratio (which is used as trading size) 120576119905is the error term and log119875119860119905 and log119875119861119905 are the logarithmicdifferences in the stock prices 119860 and 119861 at time 119905 We convertvalues of 120576119905 into a Z-score used as a trading signal Forexample if the trading signal reaches the threshold we shortone share of the overvalued stock (represented as log119875119860119905)and long 1205731 shares of the undervalued stock (representedas log119875119861119905) The hedge ratio is determined based on thewindow size We set a total of six discrete window sizes toobtain the optimal window size for the experiment Tradingwindows are constituted using half of the formation-windowsizeThe spread obtained here is used as a state when applyingreinforcement learning (ie as an input of the DQN)

log119875119861119905 = 1205721 + 1205731 log119875119860119905 + 120576119905 (18)

34 Proposed Method Optimized Pairs-Trading Strategy UsingtheDQNMethod In this study we optimize the pairs-tradingstrategy with a type of game using the DQN We will attemptto implement an optimal pairs-trading strategy by takingoptimal trading and stop-loss boundaries that correspond tothe given spread since performance depends on how tradingand stop-loss boundaries are set in pairs trading [14] Figure 3shows themechanism of our proposed pairs-trading strategyThroughout the cointegration test we identify pairs andusing regression analysis obtain a hedge ratio used as tradingvolume and a spread used as a trading signal and state In thecase of the DQN two hidden layers are set up and the numberof neurons is optimized by taking half of input size throughtrial and error Action values consist of the six discrete spacesin Table 3 Each value of 119886119905 has values for trading and stop-loss boundaries

A pairs-trading system can make a profit if the spreadtouches the threshold and returns to the average such that theportfolio is closed in each tradingwindowOn the other handif the trading boundary is touched and the stop-loss boundaryis reached the system tries to minimize losses by stoppingtrades If the spread touches the trading boundary but fails toreturn to the average the strategy may end up with a profitor a loss In this study the pairs-trading strategy is thereforeconsidered as a kind of game closing a portfolio yields a posi-tive reward and a portfolio that reaches its stop-loss thresholdyields a negative reward Although an exited portfolio maypossibly generate a positive profit there is also a possibilitythat losses will occur and it is therefore set to yield a negativerewardWe set the other conditions (such as the maintenanceof the portfolio or not to execute the portfolio) to zero so asto concentrate on the close stop-loss and exit positions

119882119905 =100381610038161003816100381610038161003816100381610038161003816V119860119905 times

1198781198601199051015840 minus 119878119860119905119878119860119905 + V119861119905 times 1198781198611199051015840 minus 119878119861119905119878119861119905100381610038161003816100381610038161003816100381610038161003816 119905 lt 1199051015840 (19)

119877119905

=

1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119888119897119900119904119890119889minus1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119903119890119886119888ℎ119890119904 119904119905119900119901 minus 119897119900119904119904 119905ℎ119903119890119904ℎ119900119897119889minus500 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119890119909119894119905119890119889

(20)

6 Complexity

Table 1 The 25 stocks on the SampP 500 Index used in this study

No Ticker Stock Sector1 AAPL Apple Inc Technology2 MSFT Microsoft Corporation Technology3 BRKa Berkshire Hathaway Inc Financial Services4 JPM JPMorgan Chase amp Co Financial Services5 JNJ Johnson amp Johnson Healthcare6 XOM Exxon Mobil Corporation Energy7 BAC Bank of America Corporation Financial Services8 WFC Wells Fargo amp Company Financial Services9 WMT Walmart Inc Consumer Defensive10 UNH UnitedHealth Group Incorporated Healthcare11 CVX Chevron Corporation Energy12 T ATampT Inc Communication Services13 PFE Pfizer Inc Healthcare14 ADBE Adobe Systems Incorporated Technology15 MCD McDonaldrsquos Corporation Consumer Cyclical16 MDT Medtronic plc Healthcare17 MMM 3M Company Industrials18 HON Honeywell International Inc Industrials19 GE General Electric Company Industrials20 ABT Abbott Laboratories Healthcare21 MO Altria Group Inc Consumer Defensive22 UNP Union Pacific Corporation Industrials23 TXN Texas Instruments Incorporated Technology24 UTX United Technologies Corporation Industrials25 LLY Eli Lilly and Company Healthcare

Year

Pric

e

20172013200920052001199719931989

140

120

100

80

60

40

20

XOMCVX

Figure 2 Cointegrated stock price movements

We fix the values of portfolio close stop-loss and exitto +1000 minus1000 and minus500 respectively When we updatethe Q-values we must consider the reward as a significantcomponent of efficiently training the DQN We therefore setthe reward value to have a range similar to that of the Q-value Additionally we included the corresponding profit orloss value to reflect that weight after the trading ended Inequation (19) V119860119905 and V119861119905 are the stock orders of stocks119860 and

119861 at time 119905 119878119860119905 and 119878119861119905 are the stock prices of119860 and 119861 at time119905 and 1198781198601199051015840 and 1198781198611199051015840 are the stock prices of119860 and 119861 at time 1199051015840Algorithm 1 shows the process of our proposed method

Beforewe start our proposedmethodwe set a replaymemoryand batch size and select pairs using the cointegration testAt each epoch we initialized total profit to 10 In thetraining scheme we set a state which has spreads withinthe formation window and select actions which are used as

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 5

which maximizes the expected sum of discounted futurerewards

119876lowast (119904119905 119886119905) = max120587

119876120587 (119904119905 119886119905) (14)

This optimal action-value function can be formulated as theBellman equation

119876lowast (119904119905 119886119905) = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1)] (15)

TheDQNuses a nonlinear function approximator to estimatethe action value function This network is trained by min-imizing a sequence of loss functions 119871 119905(120579119905) which changeswith each sequence of 119905 The weight of 120579119905 is updated as thesequence progresses

119871 119905 (120579119905) = E(119904119886)sim120588(∙) [(119910119905 minus 119876 (119904119905 119886119905 120579119905))2] (16)

119910119905 = max119886119905+1

[119903119905 + 120574119876 (119904119905+1 119886119905+1 120579119905minus1) | 119904119905 119886119905] (17)

3 Materials and Methods

31 Data In this study 50 stocks from the SampP 500 Indexwere selected based on their trading volume and marketcapitalization To carry out the experiment the data mustcover the same period Therefore corresponding stocks wereselected leaving a total of 25 stocks Table 1 represents thedataset of stock names abbreviations of those stocks andtheir respective sectors We collected the adjusted dailyclosing prices using Thomson Reutersrsquo database The periodof the training dataset is from January 2 1990 to December31 2008 comprising 4792 data points the test dataset coversthe period from January 2 2009 to July 31 2018 comprising2411 data points From these datasets a pair of stocks willbe selected during the training dataset period using thecointegration test

32 Selecting PairsUsing theCointegrationTest It is necessaryto pair stocks which have long-run statistical relationshipsor similar price movements It is possible to determine thedegree to which two stocks have had similar pricemovementsthrough the correlation value Furthermore the long-termequilibrium of a pair of stocks is an important characteristicfor the execution of pairs trading In this study we usedthe cointegration approach to select pairs of stocks ThroughJohansenrsquos method we selected 11 pairs of stocks that havelong-run equilibria Table 2 shows the resulting pairs of stocksthat were identified based on t-statistics and Figure 2 showsprice movements of the cointegrated stocks XOM and CVXUsing this dataset we will verify whether our proposedmethod has better performance than the traditional pairs-trading method

33 Trading Signal After selecting the pairs it is necessaryto extract the signal for trading To extract signals we optfor the OLS or TLS methods First because the stock pricefollows a random walk [32] we need to ensure that it followsthe 119868(1) process through the augmented Dickey-Fuller testSubsequently the 119868(0) process should be created using the

logarithmic difference in stock prices which is then applied tothe OLS and TLS methods In equation (18) 1205721 is a constantvalue 1205731 is a hedge ratio (which is used as trading size) 120576119905is the error term and log119875119860119905 and log119875119861119905 are the logarithmicdifferences in the stock prices 119860 and 119861 at time 119905 We convertvalues of 120576119905 into a Z-score used as a trading signal Forexample if the trading signal reaches the threshold we shortone share of the overvalued stock (represented as log119875119860119905)and long 1205731 shares of the undervalued stock (representedas log119875119861119905) The hedge ratio is determined based on thewindow size We set a total of six discrete window sizes toobtain the optimal window size for the experiment Tradingwindows are constituted using half of the formation-windowsizeThe spread obtained here is used as a state when applyingreinforcement learning (ie as an input of the DQN)

log119875119861119905 = 1205721 + 1205731 log119875119860119905 + 120576119905 (18)

34 Proposed Method Optimized Pairs-Trading Strategy UsingtheDQNMethod In this study we optimize the pairs-tradingstrategy with a type of game using the DQN We will attemptto implement an optimal pairs-trading strategy by takingoptimal trading and stop-loss boundaries that correspond tothe given spread since performance depends on how tradingand stop-loss boundaries are set in pairs trading [14] Figure 3shows themechanism of our proposed pairs-trading strategyThroughout the cointegration test we identify pairs andusing regression analysis obtain a hedge ratio used as tradingvolume and a spread used as a trading signal and state In thecase of the DQN two hidden layers are set up and the numberof neurons is optimized by taking half of input size throughtrial and error Action values consist of the six discrete spacesin Table 3 Each value of 119886119905 has values for trading and stop-loss boundaries

A pairs-trading system can make a profit if the spreadtouches the threshold and returns to the average such that theportfolio is closed in each tradingwindowOn the other handif the trading boundary is touched and the stop-loss boundaryis reached the system tries to minimize losses by stoppingtrades If the spread touches the trading boundary but fails toreturn to the average the strategy may end up with a profitor a loss In this study the pairs-trading strategy is thereforeconsidered as a kind of game closing a portfolio yields a posi-tive reward and a portfolio that reaches its stop-loss thresholdyields a negative reward Although an exited portfolio maypossibly generate a positive profit there is also a possibilitythat losses will occur and it is therefore set to yield a negativerewardWe set the other conditions (such as the maintenanceof the portfolio or not to execute the portfolio) to zero so asto concentrate on the close stop-loss and exit positions

119882119905 =100381610038161003816100381610038161003816100381610038161003816V119860119905 times

1198781198601199051015840 minus 119878119860119905119878119860119905 + V119861119905 times 1198781198611199051015840 minus 119878119861119905119878119861119905100381610038161003816100381610038161003816100381610038161003816 119905 lt 1199051015840 (19)

119877119905

=

1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119888119897119900119904119890119889minus1000 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119903119890119886119888ℎ119890119904 119904119905119900119901 minus 119897119900119904119904 119905ℎ119903119890119904ℎ119900119897119889minus500 times119882119905 119894119891 119901119900119903119905119891119900119897119894119900 119890119909119894119905119890119889

(20)

6 Complexity

Table 1 The 25 stocks on the SampP 500 Index used in this study

No Ticker Stock Sector1 AAPL Apple Inc Technology2 MSFT Microsoft Corporation Technology3 BRKa Berkshire Hathaway Inc Financial Services4 JPM JPMorgan Chase amp Co Financial Services5 JNJ Johnson amp Johnson Healthcare6 XOM Exxon Mobil Corporation Energy7 BAC Bank of America Corporation Financial Services8 WFC Wells Fargo amp Company Financial Services9 WMT Walmart Inc Consumer Defensive10 UNH UnitedHealth Group Incorporated Healthcare11 CVX Chevron Corporation Energy12 T ATampT Inc Communication Services13 PFE Pfizer Inc Healthcare14 ADBE Adobe Systems Incorporated Technology15 MCD McDonaldrsquos Corporation Consumer Cyclical16 MDT Medtronic plc Healthcare17 MMM 3M Company Industrials18 HON Honeywell International Inc Industrials19 GE General Electric Company Industrials20 ABT Abbott Laboratories Healthcare21 MO Altria Group Inc Consumer Defensive22 UNP Union Pacific Corporation Industrials23 TXN Texas Instruments Incorporated Technology24 UTX United Technologies Corporation Industrials25 LLY Eli Lilly and Company Healthcare

Year

Pric

e

20172013200920052001199719931989

140

120

100

80

60

40

20

XOMCVX

Figure 2 Cointegrated stock price movements

We fix the values of portfolio close stop-loss and exitto +1000 minus1000 and minus500 respectively When we updatethe Q-values we must consider the reward as a significantcomponent of efficiently training the DQN We therefore setthe reward value to have a range similar to that of the Q-value Additionally we included the corresponding profit orloss value to reflect that weight after the trading ended Inequation (19) V119860119905 and V119861119905 are the stock orders of stocks119860 and

119861 at time 119905 119878119860119905 and 119878119861119905 are the stock prices of119860 and 119861 at time119905 and 1198781198601199051015840 and 1198781198611199051015840 are the stock prices of119860 and 119861 at time 1199051015840Algorithm 1 shows the process of our proposed method

Beforewe start our proposedmethodwe set a replaymemoryand batch size and select pairs using the cointegration testAt each epoch we initialized total profit to 10 In thetraining scheme we set a state which has spreads withinthe formation window and select actions which are used as

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

6 Complexity

Table 1 The 25 stocks on the SampP 500 Index used in this study

No Ticker Stock Sector1 AAPL Apple Inc Technology2 MSFT Microsoft Corporation Technology3 BRKa Berkshire Hathaway Inc Financial Services4 JPM JPMorgan Chase amp Co Financial Services5 JNJ Johnson amp Johnson Healthcare6 XOM Exxon Mobil Corporation Energy7 BAC Bank of America Corporation Financial Services8 WFC Wells Fargo amp Company Financial Services9 WMT Walmart Inc Consumer Defensive10 UNH UnitedHealth Group Incorporated Healthcare11 CVX Chevron Corporation Energy12 T ATampT Inc Communication Services13 PFE Pfizer Inc Healthcare14 ADBE Adobe Systems Incorporated Technology15 MCD McDonaldrsquos Corporation Consumer Cyclical16 MDT Medtronic plc Healthcare17 MMM 3M Company Industrials18 HON Honeywell International Inc Industrials19 GE General Electric Company Industrials20 ABT Abbott Laboratories Healthcare21 MO Altria Group Inc Consumer Defensive22 UNP Union Pacific Corporation Industrials23 TXN Texas Instruments Incorporated Technology24 UTX United Technologies Corporation Industrials25 LLY Eli Lilly and Company Healthcare

Year

Pric

e

20172013200920052001199719931989

140

120

100

80

60

40

20

XOMCVX

Figure 2 Cointegrated stock price movements

We fix the values of portfolio close stop-loss and exitto +1000 minus1000 and minus500 respectively When we updatethe Q-values we must consider the reward as a significantcomponent of efficiently training the DQN We therefore setthe reward value to have a range similar to that of the Q-value Additionally we included the corresponding profit orloss value to reflect that weight after the trading ended Inequation (19) V119860119905 and V119861119905 are the stock orders of stocks119860 and

119861 at time 119905 119878119860119905 and 119878119861119905 are the stock prices of119860 and 119861 at time119905 and 1198781198601199051015840 and 1198781198611199051015840 are the stock prices of119860 and 119861 at time 1199051015840Algorithm 1 shows the process of our proposed method

Beforewe start our proposedmethodwe set a replaymemoryand batch size and select pairs using the cointegration testAt each epoch we initialized total profit to 10 In thetraining scheme we set a state which has spreads withinthe formation window and select actions which are used as

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 7

50 constituent stocks of the SampP500 Index

Filter out pairs based on trading volume liquidity andthe cointegration test

Obtain a rewardEnvironmentConstruct pairs of stocks

Preprocess datasetusing OLS or TLS Select max Q-value

Q_valuesOutputs of DQN

Deep Q-NetworkAgent

SpreadHedge ratio

Inputs of DQN

Figure 3 Steps for proposed pairs-trading strategy using the DQNmethod

Initialize replay memory119863 and batch size119873Initialize deep Q-networkSelect pairs using cointegration test(1) For each epoch do(2) Profit = 10(3) For steps t = 1 until end of training data set do(4) Calculate spreads using OLS or TLS methods(5) Obtain initial state by converting spread to Z-score based on formation window 119904

119905

(6) Using epsilon-greedy method select a random action 119886119905

(7) Otherwise select 119886119905= 119886119903119892119898119886119909

119886119876(119904119905 119886)

(8) Execute traditional pairs-trading strategy based on the action selected(9) Obtain reward 119903

119905by performing the pairs-trading strategy

(10) Set next state 119904119905+1

(11) Store transition (119904119905119886119905 119903119905 119904119905+1) in119863

(12) Sample minibatch of transition (119904119905119886119905 119903119905 119904119905+1) from119863

(13) 119910119905= 119903119905

119894119891 119904119905+1 = 119905119890119903119898119894119899119886119897

119903119905+ 120574119898119886119909

1198861015840119876 (119904119905+11198861015840) 119894119891 119904119905+1 = 119899119900119899 minus 119905119890119903119898119894119899119886119897

(14) Update Q-network by performing a gradient descent step on 119910119905minus119876(119904

119905119886)2

(15) End(16) End

Algorithm 1 Optimized pairs-trading system using DQN

trading and stop-loss boundaries Throughout the tradingwindow we executed a strategy similar to a traditional pairs-trading strategy using the action selected After executingthe strategy we obtain a reward based on the results of theportfolio Finally for the Q-learning process we update theQ-networks by performing a gradient descent step

35 Performance Measure We check our experiment resultsbased on profit maximum drawdown and the Sharpe ratioProfit is commonly used as a performance measure fortrading strategies It is calculated as the sum of returnstaking into consideration trading cost Since many trades canincrease total profit it is necessary to determine the totalprofit taking into consideration transaction costs dependingon trading volume In this study we set a trading cost of 5 bpequation (21) is almost the same as equation (19) but it doesnot include absolute value and 119862 is trading cost Maximumdrawdown represents themaximum cumulative loss from the

highest to the lowest values of the portfolio during a giveninvestment period where119875(119905) is the value of the portfolio and119879 is the terminal time value The Sharpe ratio is an indicatorof the degree of excess profits from investing in risky assetsused in evaluating portfolios [33] In equation (23) 119877119901 is theexpected sum of portfolio returns and 119877119891 is the risk-free ratewe set this value to 0 and 120590119901 is the standard deviation ofportfolio returns

119875119903119900119891119894119905 = 119879sum119905=1

[(V1119905 lowast 11987811199051015840 minus 11987811199051198781119905 + V2119905 lowast 11987821199051015840 minus 11987821199051198782119905 )

minus 119862 lowast (V1119905 + V2119905)](21)

119872119863119863(119879) = max120591isin(0119879)

(max119905isin(0120591)

119875 (119905) minus 119901 (120591)) (22)

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

8 Complexity

Table 2 Summary statistics for pairs verified using cointegrationtests

No Pairs t-statistic Correlation1 MSFTJPM minus35423lowastlowast 091652 MSFTTXN minus3448lowastlowast 086413 BRKaABT minus35148lowastlowast 094934 BRKaUTX minus33992lowastlowast 096095 JPMT minus35882lowastlowast 084866 JPMHON minus58209lowast lowast lowast 092507 JPMGE minus34494lowastlowast 091058 JNJWFC minus35696lowastlowast 096939 XOMCVX minus405lowast lowast lowast 0987910 HONTXN minus40625lowast lowast lowast 0746911 GETXN minus3467lowastlowast 09148Note lowast lowast lowast and lowastlowast denote a rejection of the null hypothesis at the 1 and5 significance levels respectively

119878ℎ119886119903119901119890 119903119886119905119894119900 = 119877119901 minus 119877119891120590119901 (23)

The Materials and Methods section should contain sufficientdetails so that all procedures can be repeated It may bedivided into headed subsections if several methods aredescribed

4 Results and Discussion

We use the stock pair XOM and CVX which rejects the nullhypothesis at the 1 significance level to verify whether ourproposed model is trained well The lengths of the windowsizes such as the formation window and trading windoware selected from the performance results with the trainingdataset From these results we select an optimized windowsize and compare our proposed model with traditional pairstrading which takes a constant set of actions with the testdataset

41 Training Results To find the optimum window sizefor the optimized pairs-trading system we experimentedwith six cases We performed the experiments based onsix window sizes and the results for each window size arecalculated by averaging the top-5 results for a total of 11 pairsFrom Tables 4 and 5 we can find that the best performanceis obtained when the formation and training windows are 30and 15 respectively based on the profit generated by both theOLS and TLS methods When we trained our networks weset a positive reward for taking more closed positions andfewer stop-loss and exit positionsWe canfind the lowest ratioof portfolio closed positions based on the number of openpositions which in the formation and trading windows arefor 30 and 15 days (068) Contrary to this result the highestratios of the number of closed positions in the formation andtrading windows are for 120 and 60 days (073) Howeverthe highest profits reported in the formation and tradingwindows are for 30 and 15 days This can be explained whenwe check the ratio of the number of stop-loss portfolios

The formation and trading window sizes are 30 and 15 daysand the ratio of portfolio stop-loss position is 013 but theformation and trading window sizes are 020 This resultindicates that it is important to reduce the stop-loss positionwhile increasing the closed position In addition we can seethat the trading signals made with the TLS method are betterthan thosemadewith theOLSmethod in all six of the discretewindow sizes The reason for this is based on the differencebetween the hedge ratios of the two methods In OLS whenone side is the reference the relative change of the other sideis estimated Since the assumption is that there is no errorcomponent on the reference side and there is an error onlyon the other side the hedge ratio varies depending on the sideused as the reference However in TLS hedging ratios are thesame regardless of which side is used as the reference For thisreason the experimental results confirm that the TLSmethodis better able to determine when to execute the pairs-tradingstrategy From these results we take the optimum windowsize when we verify our proposed method in the test datasetHowever we first need to ensure that the model we proposedis well-trained

It is important to check whether our reinforcementlearning algorithm is trained well Reference [21] suggestedthat a steadily increasing average of Q-values is evidence thatthe DQN is learning well Figure 4(a) shows the average Q-values of HON and TXN as training progressed We findthat the average Q-values steadily increased indicating thatour proposed model is properly trained In addition weprovide a positive reward when the portfolio closes and anegative reward when the portfolio reaches the stop-lossthreshold or exits Figure 4(b) shows the ratio of the numberof portfolio positions as training progressed The ratio ofclosed to open portfolio positions increased and the ratioof portfolios reaching their stop-loss thresholds to openportfolio positions decreased We also find that the ratio ofportfolio exits to open portfolio positions slightly increasedIt is possible that the rewards given for an open portfolioposition compared to those given for a closed portfolioposition are relatively small The DQN is therefore trainedto prevent portfolios from reaching their stop-loss thresholds(the more important objective) over exiting them This resultcan also serve as a basis for judging whether the proposedmodel is being trained properly

Tables 6 and 7 represent the performance results of XOMand CVX in the training dataset We call our proposedmodel pairs-trading DQN (PTDQN) and traditional pairstrading with constant action values as pairs trading withaction 0 (PTA0) to pairs trading with action 5 (PTA5) Fromthis result we can confirm that our proposed method ismore profitable than the constant pairs-trading strategiesIn addition we can see that the TLS method has a higherprofitability compared to the OLS method From PTA0 toPTA5 the trading boundary and the stop-loss boundarygrew larger the numbers of open and closed portfolios andportfolios that reached their stop-loss thresholds are reducedIn other words there is less opportunity for profit but theprobability of loss is also reduced It is important not only totake a lot of closed positions but also to take the best actionto open and close the portfolio For example if a portfolio is

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 9

Table 3 Setting a discrete action space

ActionA0 A1 A2 A3 A4 A5

Trading boundary plusmn 05 plusmn 10 plusmn 15 plusmn 20 plusmn 25 plusmn 30Stop-loss boundary plusmn 25 plusmn 30 plusmn 35 plusmn 40 plusmn 45 plusmn 50

Table 4 Results of applying the DQN method to each window size using OLS

Formationwindow

Tradingwindow MDD Sharpe

ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of portfolioexits

30 15 minus03682 01197 27344 328 225 44 5860 30 minus03779 01327 25627 210 147 41 2190 45 minus04052 01409 24112 160 114 34 11120 60 minus04383 01165 20287 134 98 28 8150 75 minus04395 01244 20098 110 80 24 6180 90 minus05045 01180 19390 100 73 21 5

Table 5 Results of applying the DQN method to each window size using TLS

Formationwindow

Tradingwindow

MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

30 15 minus04422 01061 29436 320 229 46 4460 30 minus05031 01143 25806 204 144 42 1790 45 minus05824 01072 24588 155 110 36 9120 60 minus05768 01181 24378 136 98 31 6150 75 minus05805 01245 24127 110 79 26 5180 90 minus05467 01209 23570 100 72 23 4

Table 6 Average top-5 performance results for XOM and CVX using OLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00842 01835 34068 469 336 64 96PTA0 minus02014 01452 25934 565 382 132 50PTA1 minus01431 01773 27603 409 279 45 84PTA2 minus01234 01955 26307 325 191 16 118PTA3 minus02586 00861 13850 208 86 2 120PTA4 minus02591 00803 11933 124 39 2 83PTA5 minus02448 minus00638 08588 47 11 0 36

Table 7 Average top-5 performance results for XOM and CVX using TLS within the training period

Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

PTDQN minus00944 02133 48760 541 399 104 63PTA0 minus01210 01522 41948 579 413 125 41PTA1 minus01015 01650 38834 430 310 50 70PTA2 minus01483 01722 33425 320 209 13 98PTA3 minus01386 01771 24385 217 101 3 113PTA4 minus01749 01602 16852 119 38 2 79PTA5 minus02862 00137 10362 55 10 0 45

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

10 Complexity

0

10000

8000

6000

4000

2000

Epoch200

minus2000

minus4000

minus6000

Aver

age o

f Q_v

alue

1751501251007550250

Avg_Q_value

(a)

Epoch2001751501251007550250

07

06

05

04

03

02

01

Ratio

of p

ortfo

lio

close

stopexit

(b)

Figure 4 Verification that our proposedmodel is well-trainedwithHONandTXNusingTLS (a) Average ofQ-values (b) Ratio of portfolios

opened and closed by a boundary corresponding to action 0within the same spread and if a portfolio is opened and closedby a boundary corresponding to action 1 the correspondingprofit is different Assuming that themean reversion is certainto occur if we take the maximum boundary condition toopen a portfolio we will obtain a larger profit than whenwe take a smaller boundary condition We can see that thePTDQN returns are higher than the strategy with the highestreturn among the traditional pairs trading strategies that takethe constant action Figures 5ndash8 show the changes in tradingand stop-loss boundaries and the highest profit for constantaction when applying the DQN method during the trainingperiod using OLS and TLS

Figures 5 and 6 show comparisons of PTDQN andPTA1 using the TLS method Figure 5 consists of the spreadtrading and stop-loss boundaries We find that tradingand stop-loss boundaries have different values in PTDQNshowing that it has learned to find the optimal boundary

according to each spread In contrast to PTDQN PTA1in Figure 6 has constant trading and stop-loss boundariesFigures 7 and 8 exhibit the same features we see in Figures5 and 6 The difference between these methods lies in thespreads different results can be obtained depending on thespreads used Making better spreads can therefore improveperformance

Figures 9 and 10 represent the profit corresponding toDQN and constant actions using TLS and OLS Reference[34] suggested that an average value over multiple trialsshould be presented to show the reproducibility of deepreinforcement learning because there may be different resultsfrom high variances across trials and random seeds Wetherefore conducted five trials with different random seedsThe profit graph of DQN represents the average profit ofthese trials and the filled region between the maximum andminimumprofit values We can see that PTDQNhad a higherprofit than the traditional pairs-trading strategies during

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 11

Z-sc

ore

trading signaltrading boundarystop-loss boundary

trading signaltrading boundarystop-loss boundary

Year

Year

8

6

4

2

0

minus2

minus4

minus6

minus8

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

2008200620042002200019981996199419921990

1990-01 1990-071990-04 1990-10 1991-01 1991-04 1991-07 1991-10 1992-01 1992-04

Figure 5 An example of optimizing pairs trading using PTDQN based on a training scheme using TLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 6 An example of PTA1 based on a training scheme using TLS

the training period This means that even with the samespread we can see how profit will change as the boundariesare changed In other words finding the optimal boundaryfor the spread is an important factor in optimizing theprofitability of pairs trading

42 Test Results Tables 8 and 9 show the average perfor-mance measures of each pair tested by applying the top-5trained models We can see that the constant action with

the highest returns for each pair is different and the TLSmethod is higher in all pairs than the OLS method basedon profit as shown above We also find that PTDQN hasbetter performance than traditional pairs-trading strategiesThe pair with the highest profit using the proposedmethod isHON and TXN (32755) it also shows the biggest differencebetween the DQN method and the optimal constant action(09377) We find that the proposed method has a higherSharpe ratio in all pairs except for MO and UTX when the

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

12 Complexity

trading signaltrading boundarystop-loss boundary

Year2008200620042002200019981996199419921990

minus8

minus6

minus4

minus2

0

2

4

6

8

Z-sc

ore

Figure 7 An example of optimizing PTDQN based on a training scheme using OLS

trading signaltrading boundarystop-loss boundary

Z-sc

ore

8

6

4

2

0

minus2

minus4

minus6

minus8

Year2008200620042002200019981996199419921990

Figure 8 An example of PTA0 based on a training scheme using OLS

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

5

4

3

2

1

Figure 9 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using TLS in training periods

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 13

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2008200620042002200019981996199419921990

10

15

20

25

30

35

40

45

50

Profi

t

Figure 10 Average top-5 profits generated by PTDQN and traditional pairs-trading strategies using OLS in training periods

TLS method is used If we add the Sharpe ratio in addition tothe total profit as an objective function we can build a moreoptimized pairs-trading system Based on these results wecan ensure the robustness of our proposed method for ourdataset The proposed method can be applied to other pairsof stocks found in other global markets

In Figure 11 we can see that our proposed methodPTDQN outperforms the traditional pairs trading strategiesthat have constant actions in test dataset The crucial aspectof this method is the selection of optimal boundary in thespread that makes the highest profit in constant action whichis like a constant boundary Therefore the trend is the sameas traditional pairs trading strategies however when theoptimal boundaries which have the highest profit in thespread are combined PTDQN is found to have higher profitthan traditional pairs trading strategies This method cantherefore be applied in various fields when there is a needto optimize the efficiency of a rule-based strategy [35 36]In this study we consider spread and boundaries to be theimportant factors of pairs trading strategyTherefore we triedto optimize pairs trading strategy with various trading andstop-loss boundaries using deep reinforcement learning andourmethod outperforms rule-based strategies By optimizingkey parameters in rule-based methods it can improve theperformances

Pairs trading uses two types of stock which have the sametrends However it can be broken due to various factors suchas economic issues and company risk In this situation thespread between two stocks is extremely large Although thissituation cannot be avoided we hedge this risk by takinga dynamic boundary In this sense taking the lowest stop-loss boundary is the best choice since it can be overcomewith the least loss By taking the dynamic boundary usingthe deep reinforcement learning method we can see that notonly profits are increased but losses are also minimized ascompared to taking a fixed boundary

5 Conclusions

We propose a novel approach to optimize pairs trad-ing strategy using a deep reinforcement learning method

especially deep Q-networks There are two key researchquestions posed First if we set a dynamic boundary based ona spread in each trading window can it achieve higher profitthan traditional pairs trading strategy Second is it possiblethat deep reinforcement learning method can be trainedto follow this mechanism To investigate these questionswe collected pairs selected using the cointegration test Weexperimented with how the results varied according to thespread and the method used We therefore set differentspreads using OLS and TLSmethods as the input of the DQNand the trading signal To conduct this experiment we setup a formation window and a trading window The hedgeratio which is an important factor in determining howmuchstock to take depends on this value We therefore appliedthe OLS and TLS methods and experimented to find theoptimal window size by varying the formation window andthe trading window

Tables 6 and 7 show the average performance values ofthe formation windows and trading windows in the trainingdataset The results show that all six window sizes werehigher when TLS spreads were used than in OLS spreadsIn addition we can see that profitability gradually increasesas the estimation windows and trading windows of methodsusing TLS and OLS decreased The reason is that althoughthe ratio of closed position portfolio is the lowest in whatwe set formation and trading windows the ratio of stop-loss position portfolio is also the lowest compared with otherformation and trading windows It means that reducing stop-loss position portfolio is important as well as increasingclosed position portfolio to make a profit Using the optimalwindow size we then check whether our DQN is properlytrained At each epoch we find that the average Q-valuesteadily increased the ratio of closed portfolios increasedand the ratio of portfolios that reached their stop-lossthresholds decreased confirming that our DQN is trainedwell Based on these results we find that our proposed modelusing the test dataset with a formation window of 30 anda trading window of 15 had results that were superior tothose of traditional pairs-trading strategies in the out-of-sample dataset In Figure 11 we can see that the profit path ofPTDQN is similar PTA0 to PTA5 but better than that from

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

14 ComplexityPr

ofit

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

30

35

25

20

15

10

05

(a) MSFTJPM

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

00

05

10

15

20

(b) MSFTTXN

Profi

t

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

250

225

200

175

150

125

100

(c) BRKaABT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

15

10

05

00

(d) BRKaUTX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

16

14

12

10

08

Profi

t

(e) JPMT

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(f) JPMHON

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

20

15

10

(g) JPMGE

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

Profi

t

(h) JNJWFC

Figure 11 Continued

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 15: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 15

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

12

10

14

08

06

04

(i) XOMCVX

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

Profi

t

30

25

35

20

15

05

10

(j) HONTXN

PTDQNPTA0PTA1PTA2PTA3PTA4PTA5

Year2018201720162015201420132012201120102009

175

200

225

250

150

125

100

075

Profi

t

(k) GETXN

Figure 11 Average top-5 profits of PTDQN and PTA0 to PTA5 using TLS with the test dataset

other methods This shows that taking dynamic boundariesbased on our method is efficient in optimizing the pairstrading strategy During economic issues uncertainties it canbe a risk to manage the pairs trading strategies includingour proposed method However we set a reward functionif spread is suddenly high and our network is trained toprevent this situation by taking less stop-loss boundary sinceit is trained to maximize the expected sum of future rewardsTherefore our proposed method can minimize the risk whenthe economic risks appeared compared with traditional pairstrading strategy with fixed boundary

From the experimental results we show that our methodcan be applied in the pairs trading system It can be appliedin various fields including finance and economics whenthere is a need to optimize the efficiency of a rule-basedstrategy Furthermore we find that our method outperformsthe traditional pairs trading strategy in all pairs based onconstituent stocks in SampP 500 If we select appropriate pairswhich are cointegrated we can apply our methods to othermarkets such as KOSPI Nikkei and Hang Seng The studyfocused on only spreads made by two stocks which havelong-term equilibrium patterns Since our method selectsoptimal boundaries based on spreads it can be appliedto other stock markets such as KOSPI Nikkei and HangSeng

In future works we can develop our proposed model asfollows First as profit was set as the objective function in thisstudy the performance of the model is lower than traditionalpairs trading when based on other performance measures Itcan therefore be possible to create a better-optimized pairs-trading strategy by including all these other performanceindicators as part of the objective function Second we canuse other statistical methods such as the Kalman filter anderror-correction models to use diversified spreads Finally itis possible to create a more-optimized pairs-trading strategyby continuously changing the discrete set of window sizesand boundaries We will solve these difficulties in futurestudies

Data Availability

The data used to support the findings of this study havebeen deposited in the figshare repository (DOI 106084m9figshare7667645)

Disclosure

The funders had no role in the study design data collec-tion and analysis decision to publish or preparation ofthe manuscript This work represents a part of the study

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 16: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

16 Complexity

Table 8 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing TLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus01122 02294 30446 186 126 38 62PTA0 minus03411 00742 16236 211 136 57 18PTA1 minus02907 00979 18001 162 104 26 32PTA2 minus01507 01936 26303 131 64 7 60PTA3 minus04032 01542 18282 97 39 1 57PTA4 minus04340 00400 10480 55 13 0 42PTA5 minus01836 03098 15524 30 7 0 23

MSFTTXN

PTDQN minus03420 01001 15423 204 132 47 65PTA0 minus12094 minus00571 00013 244 152 76 16PTA1 minus09225 minus00177 06131 178 110 25 43PTA2 minus05574 00351 10887 134 68 8 58PTA3 minus05375 minus00128 08326 97 34 1 62PTA4 minus04485 00260 10118 66 15 1 50PTA5 minus01048 01233 11502 32 5 0 27

BRKaABT

PTDQN minus00740 03159 23655 162 111 30 43PTA0 minus01392 01554 17157 182 128 35 18PTA1 minus01048 02464 21508 138 96 15 27PTA2 minus01133 02538 19578 108 64 3 40PTA3 minus01040 02480 17576 76 35 1 40PTA4 minus00829 02087 13171 44 13 0 31PTA5 minus00704 04366 14013 19 7 0 12

BRKaUTX

PTDQN minus05401 01174 15744 167 105 35 58PTA0 minus12143 minus00199 05918 192 117 55 19PTA1 minus09340 00346 10701 147 89 12 45PTA2 minus09099 minus00009 08435 122 60 5 57PTA3 minus05673 00473 11520 89 32 1 56PTA4 minus03641 00694 11628 53 9 0 44PTA5 minus02309 00408 10405 18 3 0 15

JPMT

PTDQN minus01384 01283 14653 175 113 42 53PTA0 minus03630 00071 08968 205 129 60 15PTA1 minus02801 00460 11595 144 94 17 32PTA2 minus03750 00192 09987 119 62 5 51PTA3 minus05241 minus00717 06609 92 35 0 56PTA4 minus03607 minus00550 08411 56 18 0 38PTA5 minus02235 00061 09851 22 6 0 16

JPMHON

PTDQN minus01872 01523 22510 223 155 39 62PTA0 minus06769 00190 10077 274 180 70 23PTA1 minus04644 00622 16331 201 139 24 38PTA2 minus04537 00840 17165 149 87 2 60PTA3 minus02410 01414 17648 107 43 0 64PTA4 minus03313 00879 13150 62 16 0 46PTA5 minus01693 01803 12777 28 7 0 21

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 17: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 17

Table 8 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01098 02123 28250 193 124 46 65PTA0 minus03897 00507 15137 224 142 65 17PTA1 minus03404 00640 16912 163 109 18 36PTA2 minus01628 01284 19032 132 73 6 53PTA3 minus02980 01142 17555 106 38 1 67PTA4 minus02817 00790 12884 55 13 0 42PTA5 minus00612 04776 17489 21 6 0 15

JNJWFC

PTDQN minus01576 02437 23741 143 100 28 38PTA0 minus02872 00892 14932 164 115 37 12PTA1 minus02219 01948 21147 127 90 15 21PTA2 minus03188 01322 16362 99 55 5 38PTA3 minus02324 01084 13141 68 27 0 41PTA4 minus01532 01043 11228 40 14 0 26PTA5 minus00970 01203 10734 16 6 0 10

XOMCVX

PTDQN minus04265 00605 11924 218 135 45 77PTA0 minus06189 00236 08812 256 161 67 28PTA1 minus05999 00154 08809 197 118 25 54PTA2 minus06034 minus00073 07792 153 70 8 75PTA3 minus05628 minus00224 07734 114 38 2 74PTA4 minus05311 minus00200 08643 70 18 1 51PTA5 minus02583 00060 09692 31 4 0 27

HONTXN

PTDQN minus00874 02679 32755 233 164 49 63PTA0 minus05108 01080 19219 276 186 66 23PTA1 minus05841 01625 23378 207 140 28 38PTA2 minus01926 02086 23096 158 92 4 62PTA3 minus01611 01557 17100 114 49 2 63PTA4 minus01254 02289 16374 69 23 0 46PTA5 minus01578 01924 11925 28 9 0 19

GETXN

PTDQN minus01133 01871 21398 172 117 30 48PTA0 minus03348 00967 16398 201 136 44 21PTA1 minus01656 01070 16355 153 101 19 33PTA2 minus02043 01388 17568 117 68 8 41PTA3 minus02335 01591 15555 89 39 2 48PTA4 minus03847 minus01355 06570 45 7 0 38PTA5 minus03489 minus02730 07218 21 2 0 19

MOUTX

PTDQN minus05264 00840 12940 150 88 35 58PTA0 minus10950 minus00272 06231 178 102 56 19PTA1 minus07205 00286 10362 125 73 12 39PTA2 minus08361 minus00040 08658 105 51 3 50PTA3 minus04311 00052 09323 79 24 0 54PTA4 minus03916 01141 12129 48 12 0 36PTA5 minus01311 02948 11276 14 3 0 11

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 18: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

18 Complexity

Table 9 Average top-5 performance results of the proposed method and the traditional pairs-trading strategy in the out-of-sample datasetusing OLS

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

MSFTJPM

PTDQN minus02096 01228 19255 215 137 54 62PTA0 minus03618 00492 13365 225 141 61 23PTA1 minus05036 00188 10185 168 102 28 38PTA2 minus04045 00611 13591 124 59 8 57PTA3 minus05055 minus00094 08636 97 33 3 61PTA4 minus04195 minus00009 09459 58 12 1 45PTA5 minus02018 01236 11593 29 6 0 23

MSFTTXN

PTDQN minus02878 00698 13466 244 153 65 68PTA0 minus05271 00070 08489 252 156 72 24PTA1 minus04721 00255 10286 187 117 26 44PTA2 minus03816 00215 09912 145 71 10 64PTA3 minus06553 minus01015 05053 104 30 2 72PTA4 minus02719 00422 10532 63 16 1 46PTA5 minus01850 00068 09785 34 7 0 27

BRKaABT

PTDQN minus01282 01644 15076 180 109 48 57PTA0 minus05073 minus00265 07070 183 112 48 22PTA1 minus02649 00453 10786 139 80 13 46PTA2 minus02246 01056 12942 121 60 4 56PTA3 minus01686 01241 12718 91 38 1 52PTA4 minus01483 00176 09778 49 12 0 37PTA5 minus01602 00004 09830 16 2 0 14

BRKaUTX

PTDQN minus05231 00816 12976 215 132 57 69PTA0 minus11928 minus00647 03332 216 133 57 25PTA1 minus08697 minus00157 07445 167 100 15 51PTA2 minus07815 minus00071 08391 135 70 5 60PTA3 minus03573 00315 10292 94 36 0 58PTA4 minus02096 00684 10857 52 11 0 41PTA5 minus01317 minus01174 09312 16 2 0 14

JPMT

PTDQN minus01338 01391 14547 205 127 60 50PTA0 minus03588 00069 09054 208 130 61 16PTA1 minus02535 00405 10902 151 96 19 35PTA2 minus01872 00542 11198 119 66 5 48PTA3 minus02574 00336 10502 94 39 0 55PTA4 minus02212 00345 10312 57 20 0 37PTA5 minus02348 minus01922 08299 20 5 0 15

JPMHON

PTDQN minus03869 01071 15175 250 162 57 68PTA0 minus07141 00181 09444 256 166 59 30PTA1 minus05065 00702 13071 198 127 22 49PTA2 minus04649 01071 14260 152 84 3 65PTA3 minus04871 00763 12098 102 44 0 58PTA4 minus03503 minus00694 08178 50 13 0 37PTA5 minus02980 minus01721 08040 23 6 0 17

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 19: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Complexity 19

Table 9 Continued

Pairs Model MDD Sharpe ratio Profit of openportfolios

of closedportfolios

of stop-lossportfolios

of exitedportfolios

JPMGE

PTDQN minus01195 01443 17682 226 133 64 69PTA0 minus04379 00036 08549 232 137 66 29PTA1 minus01523 00987 14814 165 98 16 51PTA2 minus01738 01264 15661 134 62 5 67PTA3 minus02680 00729 12026 93 29 0 64PTA4 minus02104 01298 13242 51 12 0 39PTA5 minus01461 minus00423 09586 18 3 0 15

JNJWFC

PTDQN minus01890 01266 17194 202 130 47 56PTA0 minus08705 minus00326 04635 207 131 53 22PTA1 minus06189 minus00134 07318 150 91 19 39PTA2 minus04763 00309 10563 124 57 4 62PTA3 minus02318 01447 16072 97 33 2 62PTA4 minus02415 00549 10632 50 13 0 37PTA5 minus00880 02468 11886 20 4 0 16

XOMCVX

PTDQN minus03316 00265 11517 141 81 23 43PTA0 minus07629 minus00547 04186 240 149 61 30PTA1 minus05648 00132 08754 193 114 23 56PTA2 minus06977 minus00387 06655 154 70 7 77PTA3 minus05235 00277 09865 117 38 1 78PTA4 minus04781 minus00577 08117 63 12 1 50PTA5 minus03787 minus01492 08090 29 3 0 26

HONTXN

PTDQN minus01339 01534 18852 270 175 64 69PTA0 minus04135 00212 09455 276 177 70 28PTA1 minus02758 00666 13216 207 124 27 55PTA2 minus02614 01054 15031 159 84 5 69PTA3 minus01759 01413 15617 117 45 2 70PTA4 minus00834 02650 17044 66 23 0 43PTA5 minus00664 04606 16830 30 13 0 17

GETXN

PTDQN minus01676 01263 16411 206 140 43 62PTA0 minus06133 00178 09742 211 144 44 23PTA1 minus03085 00586 12743 166 109 19 38PTA2 minus02402 00585 12216 128 68 5 55PTA3 minus03190 minus00013 09193 91 31 2 58PTA4 minus02493 minus00285 09117 49 8 0 41PTA5 minus00862 01417 10936 23 4 0 19

MOUTX

PTDQN minus03181 00524 11402 188 117 49 59PTA0 minus04688 00041 08667 195 121 52 21PTA1 minus06166 minus00230 07470 144 84 13 46PTA2 minus05034 minus00076 08666 115 51 4 59PTA3 minus02833 00457 10873 88 32 0 56PTA4 minus02901 00356 10280 44 12 0 32PTA5 minus01500 00992 10297 13 2 0 11

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 20: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

20 Complexity

conducted as aMasterThesis in Financial Engineering during2016 and 2018 at the University of Ajou Republic of Korea

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea Gov-ernment (MSIT Ministry of Science and ICT) (No NRF-2017R1C1B5018038)

References

[1] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo YaleICF Working Paper No 08-03 1998 httpsssrncomabstract=141615 or httpdxdoiorg102139ssrn141615

[2] R J Elliott J van der Hoek andW P Malcolm ldquoPairs tradingrdquoQuantitative Finance vol 5 no 3 pp 271ndash276 2005

[3] S Andrade V Di Pietro andM Seasholes ldquoUnderstanding theprofitability of pairs tradingrdquo 2005

[4] G Hong and R Susmel ldquoPairs-trading in the Asian ADRmarketrdquo Univ Houston Unpubl Manuscr 2003

[5] E Gatev W N Goetzmann and K G Rouwenhorst ldquoPairstrading performance of a relative-value arbitrage rulerdquo Reviewof Financial Studies vol 19 no 3 pp 797ndash827 2006

[6] B Do and R Faff ldquoDoes simple pairs trading still workrdquoFinancial Analysts Journal vol 66 no 4 pp 83ndash95 2018

[7] S Mudchanatongsuk J A Primbs and W Wong ldquoOptimalpairs trading A stochastic control approachrdquo in Proceedingsof the 2008 American Control Conference ACC pp 1035ndash1039USA June 2008

[8] A Tourin and R Yan ldquoDynamic pairs trading using the sto-chastic control approachrdquo Journal of Economic Dynamics ampControl vol 37 no 10 pp 1972ndash1981 2013

[9] Z Zeng and C Lee ldquoPairs trading optimal thresholds andprofitabilityrdquo Quantitative Finance vol 14 no 11 pp 1881ndash18932014

[10] S Fallahpour H Hakimian K Taheri and E RamezanifarldquoPairs trading strategy optimization using the reinforcementlearning method a cointegration approachrdquo So Computingvol 20 no 12 pp 5051ndash5066 2016

[11] P Nath ldquoHigh frequency pairs trading with US treasurysecurities risks and rewards for hedge fundsrdquo SSRN ElectronicJournal 2004

[12] T Leung and X Li ldquoOptimal mean reversion trading withtransaction costs and stop-loss exitrdquo International Journal of13eoretical and Applied Finance vol 18 no 3 2013

[13] E Ekstrom C Lindberg and J Tysk ldquoOptimal liquidation ofa pairs traderdquo in Advanced Mathematical Methods for Financepp 247ndash255 Springer Heidelberg 2011

[14] Y Lin M McCrae and C Gulati ldquoLoss protection in pairstrading through minimum profit bounds A cointegrationapproachrdquo Journal of Applied Mathematics and Decision Sci-ences vol 2006 pp 1ndash14 2006

[15] A Mikkelsen ldquoPairs trading the case of Norwegian seafoodcompaniesrdquo Applied Economics vol 50 no 3 pp 303ndash318 2017

[16] K Kim ldquoPerformance analysis of pairs trading strategyutilizinghigh frequency data with an application to KOSPI 100 EquitiesrdquoSSRN Electronic Journal p 24 2011

[17] V Holy and P Tomanova Estimation of Ornstein-UhlenbeckProcess Using Ultra-High-Frequency Data with Application toIntraday Pairs Trading Strategy 2018

[18] D Chen J Cui Y Gao and L Wu ldquoPairs trading in Chi-nese commodity futures markets an adaptive cointegrationapproachrdquo Accounting amp Finance vol 57 no 5 pp 1237ndash12642017

[19] H Puspaningrum Y Lin and C M Gulati ldquoFinding theoptimal pre-set boundaries for pairs trading strategy basedon cointegration techniquerdquo Journal of Statistical 13eory andPractice vol 4 no 3 pp 391ndash419 2010

[20] A A Roa ldquoPairs trading optimal thershold strategiesrdquo 2018[21] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying atari with

deep reinforcement learningrdquo httpsarxivorgabs131256022013

[22] Y Wang DWang S Zhang Y Feng S Li and Q Zhou ldquoDeepQ-tradingrdquo 2017 httpcsltriittsinghuaeducn

[23] C-YHuang ldquoFinancial trading as a game a deep reinforcementlearning approachrdquo 2018 httpsarxivorgabs180702787

[24] T Kim Optimizing the pairs trading strategy using Deepreinforcement learning [MS thesis] Ajou University SuwonRepublic of Korea 2019

[25] B Do R Faff and K Hamza ldquoA new approach to modelingand estimation for pairs tradingrdquo in Proceedings of the 2006Financial Management Association European Conference 2006

[26] R D Dittmar C J Neely and P A Weller ldquoIs technicalanalysis in the foreign exchange market profitable A geneticprogramming approachrdquo Journal of Financial and QuantitativeAnalysis vol 43 p 43 1997

[27] H Rad R K Low and R Faff ldquoThe profitability of pairstrading strategies distance cointegration and copula methodsrdquoQuantitative Finance vol 16 no 10 pp 1541ndash1558 2016

[28] S Johansen ldquoStatistical analysis of cointegration vectorsrdquo Jour-nal of Economic Dynamics and Control vol 12 no 2-3 pp 231ndash254 1988

[29] M H Kutner C J Nachtsheim J Neter and W Li ldquoAppliedlinear statistical modelsrdquo 1996

[30] G H Golub and C F Van Loan ldquoAn analysis of the total leastsquares problemrdquo SIAM Journal on Numerical Analysis vol 17no 6 pp 883ndash893 1980

[31] R S Sutton and A G Barto ldquoIntroduction to reinforcementlearningrdquo Learning 1998

[32] E F Fama ldquoRandom walks in stock market pricesrdquo FinancialAnalysts Journal vol 51 no 1 pp 75ndash80 1995

[33] W F Sharpe ldquoThe sharpe ratiordquo 13e Journal of PortfolioManagement 1994

[34] P Henderson R Islam P Bachman J Pineau D Precupand D Meger ldquoDeep reinforcement learning that mattersrdquo inProceedings of the13irthy-Second AAAIConference OnArtificialIntelligence (AAAI) 2018

[35] Y H Li X M Lu and N C Kar ldquoRule-based control strategywith novel parameters optimization using NSGA-II for power-split PHEV operation costminimizationrdquo IEEE Transactions onVehicular Technology vol 63 no 7 pp 3051ndash3061 2014

[36] L Dymova P Sevastianov and K Kaczmarek ldquoA stock tradingexpert system based on the rule-base evidential reasoning usingLevel 2 Quotesrdquo Expert Systems with Applications vol 39 no 8pp 7150ndash7157 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 21: Optimizing the Pairs-Trading Strategy Using Deep ...downloads.hindawi.com/journals/complexity/2019/3582516.pdf · pairs trading strategy using deep reinforcement learning, especiallydeepQ-networks,sincepairstradingstrategycan

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom