algorithmic trading with learning - ryerson universityt"t)trader learns the realized value of d...
TRANSCRIPT
Algorithmic Trading with Learning
Ryerson University
Damir Kinzebulatov1
(Fields Institute)
joint work with
Alvaro Cartea (University College London) and
Sebastian Jaimungal (University of Toronto)
1www.math.toronto.edu/dkinz1 / 43
Asset price St
Suppose that at time t < T trader has a prediction ST about ST .
ST is a random variable
e.g. in High Frequency trading, using Data Analysis algorithms:
ST − S0 =
2 · 10−2 prob 0.1
10−2 prob 0.20 prob 0.55
−10−2 prob 0.1−2 · 10−2 prob 0.05
2 / 43
Naive strategy:
if E[ST ] > St ⇒ buy
Advanced strategy:
– would incorporate prediction ST in the asset price process St
– would learn from the realized dynamics of the asset price
3 / 43
– incorporate prediction ST in the asset price process St . . .
A three point prediction... ST = −5, 0, 5 with prob 0.7, 0.2, 0.1
0 0.2 0.4 0.6 0.8 1−10
−5
0
5
10
Time
Midprice
4 / 43
Story 1: Asset price as a randomized Brownian bridge
5 / 43
Recall:
Brownian bridge βtT is a Gaussian process such that
β0T = βTT = 0, βtT ∼ N(
0,t
T(T − t)
)
6 / 43
Algorithmic trading with learning – our model
St is a “randomized Brownian bridge”
St = S0 + σβtT +t
TD
D – random change in asset price (distribution of D is known a priori)
βtT – Brownian bridge (‘noise’) independent of D
Thus, ST = S0 +D
t ↑ T ⇒ trader learns the realized value of D
7 / 43
Insider trading is not possible
Let Ft = (Su)u6t
Trader has access only to filtration Ft (but not to the filtration of βtT )
⇒ trader can’t distinguish between noise βtT and D
8 / 43
What about the standard model?
St = S0 + σWt (“arithmetic BM”)
corresponds to the choice D ∼ N(0, σ2T )
9 / 43
Proposition: Asset price St satisfies
dSt = A(t, St) dt+ σ dWt, St|t=0 = S0,
where Wt is an Ft-Brownian motion,
A(t, S) =E[D|St = S] + S0 − S
T − t
and
E[D|St = S] =
∫x exp
(x S−S0σ2(T−t) − x
2 t2σ2T (T−t)
)µD(dx)∫
exp(x S−S0σ2(T−t) − x2 t
2σ2T (T−t)
)µD(dx)
.
10 / 43
Story 2: Trader’s optimization problem
(high-frequency trading)
11 / 43
Market microstructure: Limit Order Book
Oxford Centre for Industrial and Applied Mathematics:
An order matching a sell limit order is called a buy market order (notshown, because it is executed immediately!)
12 / 43
Market microstructure: Limit Order Book
To summarize:
– use buy market orders (MO) ⇒ pay higher prices– use buy limit orders (LO) ⇒ pay lower prices, but have to wait . . .
(similarly for sell LO and sell MO)
13 / 43
Trader’s optimization problem: Strategy
Simplifying assumptions (not crucial)
– at each t post LOs & MOs for 0 or 1 units of asset, at best bid/ask price
⇒ trader’s strategy has 4 components:
`+t ∈ {0, 1} (sell LO)
`−t ∈ {0, 1} (buy LO)
m−t ∈ {0, 1} (buy MO)
m+t ∈ {0, 1} (buy MO)
– the spread is constant
14 / 43
Key quantities
Inventory:
Qt = −∫ t
0`+t dN
+t +
∫ t
0`−t dN
−t −m
+t +m−
t
where Poisson processes N+t , N−
t count the number of filled sell, buy LOs
Cash process
Xt =−∫ t
0
(St − ∆
2
)`−t 1{Qt6Q} dN
−t
+
∫ t
0
(St + ∆
2
)`+t 1{Qt>Q} dN
+t
−∫ t
0
(St + ∆
2 + ε)1{Qt6Q} dm
−t
+
∫ t
0
(St − ∆
2 − ε)1{Qt>Q} dm
+t
where ∆ = spread, ε is transaction fee for market order, St = midprice15 / 43
Constraints on inventory:
Q 6 Qt 6 Q and QT = 0
16 / 43
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
17 / 43
Trader’s optimization problem: Goal
Goal: find
sup{`±t }t≤T ,{m±
t }t≤T
E[XT +QT
(ST − ∆
2 sgn(QT )− αQT
)](1)
– 1st term: cash from trading– 2nd term: profit/cost from closing the position at T
So far midprice St was any process . . . We want RBB
St = S0 + σβtT +t
TD
18 / 43
Dynamic programming
Since RBB St satisfies an SDE
dSt = A(t, St) dt+ σ dWt
we can use Dynamic Programming to solve the optimization problem
19 / 43
Dynamic programming
Goal: find the value function
H(t, S,Q,X) =
sup`±· ,m±
·
E[XT +QT
(ST − ∆
2 sgn(QT )− αQT
) ∣∣∣∣St = S,Qt = Q,Xt = X
]
20 / 43
Dynamic programming
The value function H admits presentation
H(t,X, S,Q) = X +QS + g(t, S,Q)
where g solves (in viscosity sense) system of non-linear PDEs
0 = max{∂tg +
12σ2∂SSg +A(t, S) (Q+ ∂Sg)− ϕQ2
+1Q<Qmax`−∈{0,1} λ− [`−∆
2+ g(t, S,Q+ `−)− g
]+1Q>Qmax`+∈{0,1} λ
+[`+ ∆
2+ g(t, S,Q− `+)− g
];
max{−∆2− ε+ g(t, S,Q+ 1)− g,
−∆2− ε+ g(t, S,Q− 1)− g, 0}
}.
subject to terminal condition
g(θ, S,Q) = −∆2|Q| − αQ2, Q 6 Q 6 Q
21 / 43
Example
22 / 43
Example
Informed trader (IT) believes that
D =
{0.02 with prob 0.8−0.02 with prob 0.2
Compare the performance of IT trader with
– uninformed trader (UT) who views
D ∼ N(0, σ2T )
(i.e. St is an arithmetic BM)
– uninformed with learning (UL) who believes
D = 0.02,−0.02 with prob 0.5, 0.5
23 / 43
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
The strategy of UT
who views the midprice as a Brownian motion
24 / 43
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
The strategy of UL
who views D = −0.02, 0.02 with prob 0.5
25 / 43
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
The strategy of IT
who views D = −0.02, 0.02 with prob 0.2, 0.8
Note: for large volatility IT stops learning.
26 / 43
Example
0.02 0.04 0.06 0.08 0.10.2
0.25
0.3
0.35
0.4
0.45
Std of P&L
MeanP&L
IwLUwLUwoL
Bounds oninventory areincreasing
Risk-Reward profiles for the three types of agents as inventory bound increases
27 / 43
Example
0 5 10 15 200
1
2
3
4
# of time interval
l.o. buy
l.o. sellm.o. buy
m.o. sell
UT: the mean executed Limit and Market orders
28 / 43
Example
0 5 10 15 200
1
2
3
4
# of time interval
l.o. buy
l.o. sellm.o. buy
m.o. sell
UL: the mean executed Limit and Market orders
29 / 43
Example
0 5 10 15 200
1
2
3
4
# of time interval
l.o. buy
l.o. sellm.o. buy
m.o. sell
IT: the mean executed Limit and Market orders
30 / 43
Multiple assets
31 / 43
Multiple assets
Asset midprices S are randomized Brownian bridges
S(i)t = S
(i)0 + σ(i) β
(i)tT +
t
TD(i)
β(i)tT − mutually independent std. Brownian bridges
D(i) − the random change in asset prices – may have dependence
– asset prices interact non-linearly through D = (D(i))
– IT may trade in an asset that has high volatility, and in which they aremarginally uniformed, but can learn joint information from a second, lessvolatile, asset
32 / 43
Multiple assets
For illustration purposes...
Probability of outcomes
D(1)
-0.02 +0.02
D(2) -0.02 0.45 0.05
+0.02 0.05 0.45
σ(1) = 0.02 and σ(2) = 0.01
With observing solely S(1) or S(2) the agent is uniformed
33 / 43
Multiple assets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10
−5
0
5
10
Inventory
(Q)
The strategy of trader who excludes Asset 2 from their info
34 / 43
Multiple assets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10
−5
0
5
10
Inventory
(Q)
The strategy of trader who includes Asset 2 in their info
35 / 43
Conclusions
– Agents who have info can outperform other traders
– We show how to trade when info is uncertain
– Optimal strategy learns from midprice dynamics and outperforms naivestrategies
– Including info from other assets can add value to assets in which learningdoes not help
Thank you!
www.math.toronto.edu/dkinz
36 / 43