applications of machine learning in high frequency trading

Applications of Machine Learning in

High Frequency Trading

HELLO!We are :-

Aashish Jhamtani (15BM6JP01)Ayan Sengupta (15BM6JP09)Neeti Pokharna (15BM6JP27)

Siddhant Sanjeev (15BM6JP45)Sujan Kumar Biswas (15BM6JP49)

What is High Frequency

Trading?Let’s start with the first set of slides 1

“Mandatory DisclaimerAll characters and events depicted in this

project are entirely modeled. Any similarity to actual events or stock price movements is

purely awesome.

“Using computer algorithms to rapidly trade securities

● Positions are held for seconds to minutes● Reaction times to market changes are sub-millisecond● HFT accounts for more than 60% of all trading volume in some

markets● Due to market efficiency it is challenging to come up with robust

predictive models● Columns include timestamp, price, order-flag, size, etc.

Challenges and Scope

◉ The special challenges for machine learning : due to very fine granularity of data.

◉ A lack of understanding of how such low-level data relates to actionable circumstances (such as profitably buying or selling shares, optimally executing a large order, etc.).

◉ No prior intuitions about how (say) the distribution of liquidity in the order book relates to future price movements, if at all.

SCOPE : a comparative study of various ML strategies and their performance on the data obtained through the Bloomberg Terminal. Finally, to design a successful strategy to operate in such a scenario.

How Does the Data Look Like

WANT BIG IMPACT?USE BIG IMAGE.

A PICTURE IS WORTH A THOUSAND WORDS

A complex idea can be conveyed with just a

single still image.

Namely making it possible to absorb large amounts of data quickly.

Holt – Winter’s Model

Simple Exponential Smoothing is used for applying as many as three low-pass filters with exponential window functions. It was then used to predict the output for the next tick and the MSE was found to be : 253.9Output curve is as shown :-

Feed Forward Model

A feed-forward neural network is an artificial neural network where connections between the units do not form a cycle. Here it was used to predict the output for the next tick and the MSE was found to be : 144.67Output curve is as shown :-

News Sentiment Analysis

Stanford Parser was used to get the dependency graph for a sentence.Sentiment judged using senti-word-net.Output graph is as shown :-

Black Schole’s Model● Stock prices follows geometric Brownian motion i.e. returns follow lognormal distribution with

constant drift and volatility

● For one period binomial model call price under risk neutral probability measure Q in arbitrage free market is

● The call price C(t,x) satisfies the equation

● Solving the above equation gives us

Limitations of Black Schole's R Simulation● Log returns of stocks do not always follow normal distribution as shown in the graph below.

● Fixed risk free rate, no dividend and fixed volatility assumptions are not always valid.

● The market is not complete, because of the transaction cost i.e. not always possible to choose suitable hedging portfolio (aH,bH) in risky and risk free assets.

● Out of the money performance of BS model is not as good as in the money performance.

● A fixed no-arbitrage price for any option on the stock.

Markov Diffusion Model● The drift and volatility are finite state Markov chain in continuous time. ● For example - Assume drift and volatility both are 2-state processes. Then the stock price

follows Markov switching diffusion model -

● Then the arbitrage free European call price for maturity T, strike price K are two solutions -

Markov Diffusion Model R Simulation• If at both states drift and volatility remains same, this model corresponds to Black and Scholes

model.• The market is not complete because of the state transitions. • The output is shown below :-

Jump Diffusion Process• Jumps should occur in an instantaneous fashion, neglecting the possibility of a Delta hedge

• Probability of any jump occurring in a particular interval of time should be approximately proportional to length of that time interval

• All future jumps should have no "memory" of past jumps

• The extra parameters ν and m represent the standard deviation of the lognormal jump process and the scale factor for jump intensity, respectively.

Jump Diffusion Result DiscussionHigh RMSE (=26.7) since, Jump-diffusion model cannot incorporate possible dependence structure among asset returns (the so-called “volatility clustering effect”), simply because the model assumes independent increments. Here is the plot for the same :-

Support Vector Regression• Motivation: Find a function that tries to predict dependent variable (Y) on the basis of

independent variables (X) such that deviation from predicted value and actual value is less than or equal to a measure (ε) till which error is tolerated

• Methodology: Polynomial Kernel used - Radial Basis function kernel

• Advantage: In traditional methods, only Stock Price, Strike Price and Time to Maturity are used. SVR can capture the effect of Risk free interest rate and volatility which are dynamic with time

• Conclusion: Decent RMSE (=19.56) compared to other parametric methods

Symmetrized Nearest Neighbour• Semi-parametric estimation of liquidity effects on option pricing

• When obtaining volatility nonparametric function in money-ness intervals for which amount of data is relatively small, use of multivariate kernel based on global smoothing parameter may lead to poor estimation results

• When estimating in one point we calculate weight for rest of our observations by looking at distance between values of empirical distribution at each point rather than distance between points themselves

• Empirical distribution changes the random design to a uniform design with knots uniformly spaced between zero and one

Symmetrized Nearest Neighbour Regression Results

Inputs:• Money-ness• Bid ask Spread• Days until Expiration (T)• Volatility calculated was given as input in BS

model

Conclusion: In-sample performance of model is better compared to a competing model without liquidity. However, out-of-sample performance is quite disappointing resulting in high MSE

K Nearest Neighbour Implementation and Results

• Non-parametric algorithm that can be used for either classification or regression.

• For each data point, the algorithm finds the k closest observations, and then classifies the data point to the majority.

• The data set was encoded so as to fit the KNN classifier to it. The data was compared to the previous tick and if there was an increase from a previous value then the data was encoded as 1, 0 otherwise

• Accuracy was checked for the data and a corresponding confusion matrix was created for the data

• Highest Accuracy was found for value of K=5

Artificial Neural Network : Conjugate GradientMOTIVATION :-

• Results suggest that machine learning can be used as a basis for effective option investment strategies using several Multilayer Perceptron models.

• Existing models in finance for predicting the price of an option, most of which revolve around the Black-Scholes model; however, these models tend to involve highly complex mathematics and often make many assumptions about the underlying characteristics of the market.

ADVANTAGES:-

• Machine Learning techniques do not make any implicit assumptions about the relationships between input variables. Using an Artificial Neural Network, we are able to let the learner discover relationships that may not be included in standard models like Black-Scholes.

Artificial Neural Network : Conjugate Gradient (Cont…)

METHODOLOGY :-

From The data obtained by Bloomberg Terminal, we obtained the historical data of the following attributes.

• Strike Price (X)• Underlying Stock Price (S)• 10-day Historical Volatility• 30-day Historical Volatility• Days until Expiration (T)• The Market Price of the Option (P)• The 10 previous days of stock prices• Risk-free rate• Expiration Price of Stock (E)

Apply ANN using these parameters

Artificial Neural Network : Conjugate Gradient (Cont…)

CONCLUSIONS :-

Using Artificial Neural Network on the test data for option pricing gives a more accurate result as compared to the other base learner models.For further improvements we can add the following parameters as the input to the specified model as well.• Measures of Volatility• Previous Stock Prices

Although ANN using Conjugate Gradient is a very renowned and used Machine Learning method, it doesn’t provide a very accurate result as Conjugate Gradient doesn’t account for the damping Least Square Error..Due to this problem we switch over to the Levenberg Marquardt Optimization in Neural Network which takes care of the above specified problem.

Artificial Neural Network : Levenberg Marquardt

• Majorly used to solve non-linear least squares problems.• Finds local minima instead of global minima.• LMA interpolates between the Gauss-Newton Algorithm (GNA) and the method of gradient

descent.

TRADING 2

TRADING

Price appreciation,Timing risk Market

impact

trade fast

trade slowlyWe want to balance market risk and market impact.

Market risk

Market impact

We have to deal with …

DISCRETE TRADING MODEL

◉ Trading is possible at N discrete times ◉ No interest on cash position◉ A trading strategy is given by (xi)i=0..N+1 where

xk = #units hold at t=k (i.e. we sell nk=xk-xk+1 at price Sk)◉ Boundary conditions: x0 = X and xN+1 = 0◉ Price dynamics:

Exogenous: Arithmetic Random WalkSk = Sk-1 + (k+), k=1..N

with k ~ N(0,1) i.i.d Endogenous: Market Impact

HIDDEN MARKOV MODEL

◉Decide the “hidden” states: up trend, mean reverting, down trend

MEAN REVERTING

μ ≈ 0DOWNμ >> 0

UPμ >> 0

p11

p12

p13

p22

p21 p23

p33

p32

p31

TECHNICAL INDICATORSBOLLINGER BANDS & STOCHASTIC

OSCILLATORS

Stochastic Oscillator◉ Offers a measurement of deviance

of currency pair’s rate (price) from its normal levels

◉ Offers indications of when a currency pair is overbought/oversold

◉ Works well in markets that are not trending, but rather just fluctuating back and forth between an upper level (resistance) and a lower level (support)

Bollinger Bands◉ Excellent range-bound indicator that

measures standard deviation from the moving average

◉ Operates under the logic that a currency pair’s price is most likely to gravitate towards its average, and hence when it strays too far – such as two standard deviations away – it is due to retrace back to its moving average

“Plots three bands on a price chart to create two price channels. The security is said to be overbought if price line is consistently near or breaches upper price band. It may be oversold if the price line is consistently near or drops below the lower price band.

“ EMA

“SMA

“Overbought position is confirmed if stochastic lines cross above 80 and, at the same time, price line is consistently near upper Bollinger Band. At that level, prices are expected to drop soon. The opposite is also true.

“MA Type = EMA

“MA Type = SMA

STRATEGY IS SIMPLE

The first condition you are looking for is a

candle breaking the UPPER or LOWER Bollinger Band

Look for stochastics to have traveled above 80 line (for a bullish candle traveling outside upper Bollinger Band), or below the 20 line

Wait for the next candle to form

before you get into the trade

Once the next candle has formed the

stochastics lines should have crossed and be

heading back towards the white line.

If the stochastic lines have not yet crossed or they are becoming further apart do not take the trade, wait

until all of the conditions are met.

If the conditions have been met,

place a trade in the opposite direction of the previous candle

PERFORMANCE INDICATORSharpe RatioIs the ratio of yearly return to yearly volatility

SHARPE RATIO TABLE TO COMPARE ALGORITHMS

Value

DataSet -0.00029266783

Black Scholes -0.1735110594

Jump Process -0.173409357

SNN & SVR -0.00012865643

PROFIT AND LOSS CURVE

CONCLUSIONS

• The out-of-sample performance is not comparable regardless of what option pricing model is employed in the estimation

• Artificial Neural Network (Feed Forward) model gives best result among forecasting tools

• Semi-parametric implied volatility estimation is more effective than BS implied volatility

• Non-parametric method give better accuracy compared to parametric methods

• SVR takes less time and gives decent result among the non-parametric methods

• ANN able to capture effect of many more variables like dividend and historical volatilities but largely depends on the volatility of data input

REFERENCES

• Machine Learning for Market Microstructure and High Frequency Trading

• Quest for Efficient Option Pricing Prediction model using Machine Learning Techniques - B.V. Phani, B. Chandra, Vijay Raghav

• A Semiparametric Estimation of Liquidity Effects on Option Pricing - Eva Ferreira

• http://www.platonniaga.com/downloads/ea-documents/Lesson%205%20Stochastics%20and%20Bollinger%20Bands.pdf

THANKS!

applications of machine learning in high frequency trading

Economy & Finance