An automated trading system based on Recurrent Reinforcement
Learning
Students: Lior Kupfer Pavel LifshitsSupervisor: Andrey BernsteinAdvisor: Prof. Nahum Shimkin
Technion – Israel Institute of TechnologyFaculty of Electrical Engineering
Control and Robotics Laboratory
Introduction Notations The System The Learning Algorithm Project Goals Results
◦ Artificial Time Series (the AR case)◦ Real Foreign Exchange / Stock Data
Conclusions Future work
Outline•Outline•Introduction•Notations•The system•The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
2L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Using Machine Learning methods for trading
◦ One relatively new approach to financial trading
◦ Using learning algorithms to predict the rise and fall of asset prices before they occur
◦ An optimal trader would buy an asset before the price rises, and sell the asset before its value declines
Introduction•Outline•Introduction•Notations•The system•The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
3L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Introduction•Outline•Introduction•Notations•The system•The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
4L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Trading technique
◦ An asset trader was implemented using recurrent reinforcement learning (RRL) suggest by Moody and Saffell (2001)
◦ It is a gradient ascent algorithm which attempts to maximize a utility function known as Sharpe’s ratio.
◦ We denote a parameter vector which completely defines the actions of the trader.
◦ By choosing an optimal parameter for the trader, we attempt to take advantage of asset price changes.
Introduction•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
5L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Due to transactions costs which include◦ Commissions◦ Bid/Ask spreads◦ Price slippage◦ Market impact
Our constrains◦ Can’t make arbitrarily frequent trade◦ Can’t make large changes in portfolio composition.
Model assumptions ◦ Fixed position size◦ Single security
Notations•Outline•Introduction•Notations•The system•The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
6L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
– Fixed quantities of security◦ The price series is ◦ - The corresponding price changes
- Out position in each time step◦
- System return in each time step◦
Z 1 2, ,.., Tz z z
1i i ir z z
tF , , 1,0, 1tF Long Neutral Short
tR 1 1t t t t tR F r F F number of securities commision rate
Notations•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
7L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
- Additive profit accumulated over T time periods
◦
- Performance criterion◦ ◦ Is the marginal increase in the
performance
TP
0 01
, 0 0T
T t Tt
P R P and usually F F
U 2 1,..., ,..., ,T T tU U R R R R
1t t t tD U U U
The system•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
8L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
◦ - Parameters vector (which we attempt to learn)◦ - Information available at time t (in our case - the
price changes)◦ - Stochastic extension (noise) which level can be
varied to control “exploration vs. exploitation”.
Our system is a single layer recurrent neural network:
Formally:
◦ ◦
1; , ;t t t t tF F F I e
tanht tF V
0 1, , ,..., , Tt mu v v v w
1 1, , ,..., ,1t t t t mV F r r r
ttI
te
The system•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
9L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The learning algorithm•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
10L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
We use reinforcement learning (RL) to adjust the parameters of the system to maximize our performance criteria of choice
RL – an alternative between supervised & unsupervised learning
RL Framework:◦ Agent Environment◦ Reward◦ Expected Return◦ Policy Learning
RL modus operandi◦ Agent perceives the state of the environment st and chooses an action at. It
subsequently observes the new state of the environment st+1 and receives a reward rt.
◦ Aim : Learn a policy π (mapping from states to actions), which optimizes the expected return
t
The learning algorithm•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
11L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
RL approaches
◦ Direct RL In this approach, the policy is represented directly. The reward function
(immediate feedback) is used to adjust the policy on the fly. e.g. policy search
◦ Value function RL In this approach ,values are assigned to each state (or state‐action
pair). Values correspond to estimates of future expected returns, or in other words, to the long‐term desirability of states. These values help guide the agent towards the optimal policy
e.g. TD-Learning, Q-Learning.
◦ Actor-Critic The model is split into two parts: the critic, which maintains the state
value estimate V, and the actor, which is responsible for choosing the appropriate actions at each state.
The learning algorithm•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
12L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
In RRL we learn the policy by gradient ascent in the performance function
Performance function can be◦ Profits◦ Sharpe’s ratio◦ Sterling ratio◦ Double deviation
Moody suggests an additive and differentiable approximation for Sharpe’s ratio – the differential Sharpe’s ratio
1t t t
tt
t
dUd
tU
ˆtS
1 1 1
21 1 1
ˆ t
t
t t t t t t
t t t t t t
AS t
B
A A R A A A first order expansionof exponential moving averageof returns
B B R B B B first order expansionof exponential moving averageof std
learning ratedecay parameter
The learning algorithm•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
13L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Now we develop
◦ Note:
t
t
dUd
1 1
32 2
1 1
1 13
2 21 1
12t t t t
tt
t t
t t t t
t t t t
t t t t
tt t
B A A BdSDd B A
dU dU dS dDtocalculate wewill need tocalculated dR dR dR
dD B A RwheredR B A
1
1 1
1 1
1 1 1 1
...
T t t t t t t
t t t t t t
t t t t t t t
t t t t t t t
dU dD dR dF dR dFd dR dF d dF d
dF F F F F F Fwhere
d F F
Project goals•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
14L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Investigate Reinforcement Learning by policy gradient
Implement an automated trading system which learns it’s trading strategy by Recurrent Reinforcement Learning algorithm
Analyze the system’s results & structure
Suggest and examine improvement methods
Results•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
15L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
DataSet GoalsArtificial Time Series •Show the system can
learn•Analyze parameters effect•Validate various model approximations
Real Foreign Exchange EUR/USD Data
•Show the system can learn a profitable strategy on real data•Search for possible improvements
Results•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
16L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The challenges we face◦ Model parameters:
◦ If and how to normalize the learned weights?
◦ How to normalize the input? The averages changes over time (non stationary) – we assume that the change is slower
than “how far back we look”
/ tanh
e
train
test
l h
M number of autoregressiveinputslearning ratedecay parameteradaptation rate
n number of learning epochsL sizeof training setL sizeof test setq q quantization levels for
d transactioncost
Results – Artificial series•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
17L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
rt – the return series are generated by AR(p) process
We analyze the effect of ◦ transaction costs◦ quantization levels◦ number of autoregressive inputsOn◦ Sharpe’s ratio◦ trading frequency◦ Profits
Effect of initial conditions
Results – Artificial series•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
18L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
0 50 100 150 200 2501080
1100
1120
pric
e
0 50 100 150 200 250-1
0
1
F t
0 50 100 150 200 250-50
0
50
Pro
fits
in %
0 50 100 150 200 250-1
0
1
Sha
rpe
ratio
Results – Artificial series•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
19L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
39
40
41
42
43
44
45
46
47
48
49
0% 0.1% 0.5% 1%
trading frequency vs. transaction costs
30
32
34
36
38
40
42
44
46
0% 0.1% 0.5% 1%
profits vs. transaction cost
Results – Artificial series•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
20L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
0 10 20 30 40 50 60 70 800.04
0.06
0.08
0.1
0.12
0.14
0.16Sharpe's ratio per epoch
Sha
rpe'
s ra
tio
epoch
Results – Artificial series•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
21L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
0 50 100 150 200 250-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5Profits with prices generated by i.i.d. process
0 50 100 150 200 2500
1
2
3
4
5
6
7Profits with prices generated by AR(2) process
Results – Artificial series•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
22L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
0 50 100 150 200 250-10
0
10
20
30
40
50
60
70
80
3 positions conservative2 positions3 positions equal levels
0 50 100 150 200 250-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
50 100 150 200 250-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
% Long positions
% Neutral positions
% Short positions
2 Positions trader
51.2% 0% 48.8%
3 Positions trader
40% 25.2% 34.8%
3 Positions conservative trader
31.2% 48.8% 20%
Results – Real Forex Data•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
23L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The prices series are of US Dollar vs. Euro exchange rate between 21/05/2007 until 15/01/2010 on 15 minutes data points
We compare our trader with◦ Random strategy of Uniform distribution◦ Buy and Hold strategy of Euro against US Dollar.
Results – Real Forex Data•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
24L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
0 50 100 150 200 250-4
-2
0
2
4
6
8x 10
-3
prof
its
Our traderMonkeyBuy & Hold
0 50 100 150 200 2501.337
1.338
1.339
1.34
1.341
1.342
1.343
1.344
pric
e 1
EU
R =
x U
SD
No commissions
Results – Real Forex Data•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
25L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
With commissions (0.1%)
0 50 100 150 200 250-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
prof
its
Our traderMonkeyBuy & Hold
0 50 100 150 200 2501.337
1.338
1.339
1.34
1.341
1.342
1.343
1.344
pric
e 1
EUR =
x U
SD
Conclusions•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
26L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
RRL performs better than the random strategy
Positive Sharpe Ratios achieved in most cases
RRL seems to struggle during volatile periods
Large variance is a major cause for concern
Can’t unravel complex relationships in the data
Changes in market condition lead to waste of all the system’s learning during the training phase (but most learning systems suffer from this).
Conclusions•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
27L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
When trading real data - the transaction cost is a killer
Normalizing the input series can be a real challenge◦ The input series are non-stationary◦ We assume the rate of change of average number of AR inputs
to the system
Normalizing the weights – heuristically◦ Threshold method leads to best results on both artificial & real
data
Redundancy when input series are ARMA processes
Large training sessions under constant market conditions lead to overfitting
Future work•Outline•Introduction•Notations•The system •The Learning Algorithm•Project Goals•Results
•Artificial Series•Real Forex Data
•Conclusions•Future work
28L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Wrapping the system with risk management layer (e.g. Stop-Loss, retraining trigger, shut down the system under anomalous behavior)
Dynamical adjustment of external parameters (such as learning-rate)
Working with more than one security
Working with variable size positions
Working with coordination with another expert system (based on other algorithms)
Acknowledgment
29L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
We would like to thank our project supervisor Andrey Bernstein for the guidance, Prof. Nahum Shimkin for advising us and allowing us to pursue a research project of our interest and sharing his experience with us.
Additionally we would like to thank Prof. Ron Meir & Prof. Neri Merhav for their time spent consulting us.
Special warm thanks to Gabriel Molina from Stanford university and Tikesh Ramtohul from University of Basel for their priceless help.
Questions?
30L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
References
31L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”,
Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
[1] J Moody, M Saffell, Learning to Trade via Direct Reinforcement, IEEE Transactions on Neural Networks,Vol 12, No 4, July 2001
[2] Carl Gold, FX Trading via Recurrent Reinforcement Learning, CIFE, Hong Kong, 2003
[3] M.A.H. Dempster, V. Leemans, An Automated FX trading system using adaptive reinforcement learning, Expert Systems with Applications 30, pp.543-552, 2006