algorithmic trading with learning - ryerson universityt"t)trader learns the realized value of d...

Algorithmic Trading with Learning

Ryerson University

Damir Kinzebulatov1

(Fields Institute)

joint work with

Alvaro Cartea (University College London) and

Sebastian Jaimungal (University of Toronto)

1www.math.toronto.edu/dkinz1 / 43

Asset price St

Suppose that at time t < T trader has a prediction ST about ST .

ST is a random variable

e.g. in High Frequency trading, using Data Analysis algorithms:

ST − S0 =

2 · 10−2 prob 0.1

10−2 prob 0.20 prob 0.55

−10−2 prob 0.1−2 · 10−2 prob 0.05

2 / 43

Naive strategy:

if E[ST ] > St ⇒ buy

Advanced strategy:

– would incorporate prediction ST in the asset price process St

– would learn from the realized dynamics of the asset price

3 / 43

– incorporate prediction ST in the asset price process St . . .

A three point prediction... ST = −5, 0, 5 with prob 0.7, 0.2, 0.1

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10

Time

Midprice

4 / 43

Story 1: Asset price as a randomized Brownian bridge

5 / 43

Recall:

Brownian bridge βtT is a Gaussian process such that

β0T = βTT = 0, βtT ∼ N(

0,t

T(T − t)

)

6 / 43

Algorithmic trading with learning – our model

St is a “randomized Brownian bridge”

St = S0 + σβtT +t

TD

D – random change in asset price (distribution of D is known a priori)

βtT – Brownian bridge (‘noise’) independent of D

Thus, ST = S0 +D

t ↑ T ⇒ trader learns the realized value of D

7 / 43

Insider trading is not possible

Let Ft = (Su)u6t

Trader has access only to filtration Ft (but not to the filtration of βtT )

⇒ trader can’t distinguish between noise βtT and D

8 / 43

What about the standard model?

St = S0 + σWt (“arithmetic BM”)

corresponds to the choice D ∼ N(0, σ2T )

9 / 43

Proposition: Asset price St satisfies

dSt = A(t, St) dt+ σ dWt, St|t=0 = S0,

where Wt is an Ft-Brownian motion,

A(t, S) =E[D|St = S] + S0 − S

T − t

and

E[D|St = S] =

∫x exp

(x S−S0σ2(T−t) − x

2 t2σ2T (T−t)

)µD(dx)∫

exp(x S−S0σ2(T−t) − x2 t

2σ2T (T−t)

)µD(dx)

.

10 / 43

Story 2: Trader’s optimization problem

(high-frequency trading)

11 / 43

Market microstructure: Limit Order Book

Oxford Centre for Industrial and Applied Mathematics:

An order matching a sell limit order is called a buy market order (notshown, because it is executed immediately!)

12 / 43

Market microstructure: Limit Order Book

To summarize:

– use buy market orders (MO) ⇒ pay higher prices– use buy limit orders (LO) ⇒ pay lower prices, but have to wait . . .

(similarly for sell LO and sell MO)

13 / 43

Trader’s optimization problem: Strategy

Simplifying assumptions (not crucial)

– at each t post LOs & MOs for 0 or 1 units of asset, at best bid/ask price

⇒ trader’s strategy has 4 components:

`+t ∈ {0, 1} (sell LO)

`−t ∈ {0, 1} (buy LO)

m−t ∈ {0, 1} (buy MO)

m+t ∈ {0, 1} (buy MO)

– the spread is constant

14 / 43

Key quantities

Inventory:

Qt = −∫ t

0`+t dN

+t +

∫ t

0`−t dN

−t −m

+t +m−

t

where Poisson processes N+t , N−

t count the number of filled sell, buy LOs

Cash process

Xt =−∫ t

0

(St − ∆

2

)`−t 1{Qt6Q} dN

−t

+

∫ t

0

(St + ∆

2

)`+t 1{Qt>Q} dN

+t

−∫ t

0

(St + ∆

2 + ε)1{Qt6Q} dm

−t

+

∫ t

0

(St − ∆

2 − ε)1{Qt>Q} dm

+t

where ∆ = spread, ε is transaction fee for market order, St = midprice15 / 43

Constraints on inventory:

Q 6 Qt 6 Q and QT = 0

16 / 43

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

17 / 43

Trader’s optimization problem: Goal

Goal: find

sup{`±t }t≤T ,{m±

t }t≤T

E[XT +QT

(ST − ∆

2 sgn(QT )− αQT

)](1)

– 1st term: cash from trading– 2nd term: profit/cost from closing the position at T

So far midprice St was any process . . . We want RBB

St = S0 + σβtT +t

TD

18 / 43

Dynamic programming

Since RBB St satisfies an SDE

dSt = A(t, St) dt+ σ dWt

we can use Dynamic Programming to solve the optimization problem

19 / 43

Dynamic programming

Goal: find the value function

H(t, S,Q,X) =

sup`±· ,m±

·

E[XT +QT

(ST − ∆

2 sgn(QT )− αQT

) ∣∣∣∣St = S,Qt = Q,Xt = X

]

20 / 43

Dynamic programming

The value function H admits presentation

H(t,X, S,Q) = X +QS + g(t, S,Q)

where g solves (in viscosity sense) system of non-linear PDEs

0 = max{∂tg +

12σ2∂SSg +A(t, S) (Q+ ∂Sg)− ϕQ2

+1Q<Qmax`−∈{0,1} λ− [`−∆

2+ g(t, S,Q+ `−)− g

]+1Q>Qmax`+∈{0,1} λ

+[`+ ∆

2+ g(t, S,Q− `+)− g

];

max{−∆2− ε+ g(t, S,Q+ 1)− g,

−∆2− ε+ g(t, S,Q− 1)− g, 0}

}.

subject to terminal condition

g(θ, S,Q) = −∆2|Q| − αQ2, Q 6 Q 6 Q

21 / 43

Example

22 / 43

Example

Informed trader (IT) believes that

D =

{0.02 with prob 0.8−0.02 with prob 0.2

Compare the performance of IT trader with

– uninformed trader (UT) who views

D ∼ N(0, σ2T )

(i.e. St is an arithmetic BM)

– uninformed with learning (UL) who believes

D = 0.02,−0.02 with prob 0.5, 0.5

23 / 43

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

The strategy of UT

who views the midprice as a Brownian motion

24 / 43

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

The strategy of UL

who views D = −0.02, 0.02 with prob 0.5

25 / 43

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

The strategy of IT

who views D = −0.02, 0.02 with prob 0.2, 0.8

Note: for large volatility IT stops learning.

26 / 43

Example

0.02 0.04 0.06 0.08 0.10.2

0.25

0.3

0.35

0.4

0.45

Std of P&L

MeanP&L

IwLUwLUwoL

Bounds oninventory areincreasing

Risk-Reward profiles for the three types of agents as inventory bound increases

27 / 43

Example

0 5 10 15 200

1

2

3

4

# of time interval

l.o. buy

l.o. sellm.o. buy

m.o. sell

UT: the mean executed Limit and Market orders

28 / 43

Example

0 5 10 15 200

1

2

3

4

# of time interval

l.o. buy

l.o. sellm.o. buy

m.o. sell

UL: the mean executed Limit and Market orders

29 / 43

Example

0 5 10 15 200

1

2

3

4

# of time interval

l.o. buy

l.o. sellm.o. buy

m.o. sell

IT: the mean executed Limit and Market orders

30 / 43

Multiple assets

31 / 43

Multiple assets

Asset midprices S are randomized Brownian bridges

S(i)t = S

(i)0 + σ(i) β

(i)tT +

t

TD(i)

β(i)tT − mutually independent std. Brownian bridges

D(i) − the random change in asset prices – may have dependence

– asset prices interact non-linearly through D = (D(i))

– IT may trade in an asset that has high volatility, and in which they aremarginally uniformed, but can learn joint information from a second, lessvolatile, asset

32 / 43

Multiple assets

For illustration purposes...

Probability of outcomes

D(1)

-0.02 +0.02

D(2) -0.02 0.45 0.05

+0.02 0.05 0.45

σ(1) = 0.02 and σ(2) = 0.01

With observing solely S(1) or S(2) the agent is uniformed

33 / 43

Multiple assets

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10

−5

0

5

10

Inventory

(Q)

The strategy of trader who excludes Asset 2 from their info

34 / 43

Multiple assets

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10

−5

0

5

10

Inventory

(Q)

The strategy of trader who includes Asset 2 in their info

35 / 43

Conclusions

– Agents who have info can outperform other traders

– We show how to trade when info is uncertain

– Optimal strategy learns from midprice dynamics and outperforms naivestrategies

– Including info from other assets can add value to assets in which learningdoes not help

Thank you!

www.math.toronto.edu/dkinz

36 / 43

algorithmic trading with learning - ryerson universityt"t)trader learns the realized value of d...

Documents