predicting stock market price using support vector regression

1

[email protected]

Predicting Stock Market Price Using Support Vector Machine with Different Kinds of Windows

By:Name: Risul Islam RaselStudent ID: 5407011866512Major Field: I-MIT

Master’s Thesis Presentation

Title:

Advisor:Assoc. Prof. Dr. Phayung Meesad

[email protected]

Agenda• Introduction

- Purpose of the study- Scope of the study

• Literature review- Time series Prediction- SVR- Windowing- Some recent research work

• Experiment design- Data collection- Data preprocess- Work flow diagram- Model Tree structure

• Results- windowing parameter values- SVR kernel function parameter value- model result analysis- Error calculation

• Conclusion

[email protected]

Introduction

• Stock exchange :- is an emerging business sector which becomes more popular among the

people.

- many people, organizations are related to this business.

- gaining insight about the market trend is become an important factor

• Stock trend or price prediction is regarding as challenging task because- Essentially a non-liner, non parametric

- Noisy

- Deterministically chaotic system

• Why deterministically chaotic system?- Liquid money and Stock adequacy

- Human behavior and News related to the stock market

- Share Gambling

- Money exchange rate, etc.

[email protected]

Purpose of the study

• To propose a stock market time series prediction model combining support

vector machine regression (SVR) and windowing operator.

• To apply the propose model to different stock market historical data set.

• To evaluate the model’s prediction results with real time data set from

stock markets in order to measure the prediction accuracy.

2

[email protected]

Scope of the study

• To develop a model which can rise early warning for financial crisis in

stock market and as well as to gain insight about the current trend of the

market.

• Propose a model which can be applied to different stock index in order to

predict stock prices. For this study, data is collected from the Dhaka stock

exchange (DSE), Bangladesh such that research result can be compared

and evaluated. 4 years (2009-2012) of historical data sets are collected and

separated into two groups, training data set (2009-2011) and testing data

set (2012).

• To compare the prediction results with different stock market time series

data from different stock index in order to evaluate the performance.

[email protected]

Literature review

Time Series Prediction

• A time series is a sequence of data points, measured typically at successive

points in time space at uniform time intervals.

• Examples of time series are the daily closing values of the stock index, daily

exchange rate, daily rainfall, flow volume of river etc.

• A time series analysis consists of two steps:

(1) Building a model that represents a time series, and

(2) Using the model to predict (forecast) future values.

• Forecasting systems are usually fed by some time series members of the last

several days whereas the next day closing price is obtained at the system

output, i.e.

Close [t-n], Close [t-n+1] ,…., Close [t-1], Close [t] � Close [t+1]

[email protected]

0123456789

1/1/

2012

1/2/

2012

1/3/

2012

1/4/

2012

1/5/

2012

1/6/

2012

1/7/

2012

1/8/

2012

1/9/

2012

Time (Days)

Va

lue (

Pri

ce) open

high

low

close

Figure 2: Time Series Prediction Process

Figure 1: Stock Market Time Series

[email protected]

Support Vector Machine Regression

� Support vector machine (SVM), a novel artificial intelligence-based method developed from statistical learning theory

� SVM has two major futures: classification (SVC) & regression (SVR).

� In SVM regression, the input is first mapped onto a m-dimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space.

� a margin of tolerance (epsilon) is set in approximation.

� This type of function is often called – epsilon intensive – loss function.

� Usage of slack variables to overcome noise in the data and non – separability.

3

[email protected]

Figure 3: Linear and Nonlinear SVR

[email protected]

Windowing operator:

� transform the time series data into

a generic data set

� convert the last row of a window

within the time series into a label

or target variable

� Fed the cross sectional values as

inputs to the machine learning

technique such as liner regression,

Neural Network, Support vector

machine and so on.

• Parameters:

� Horizon (h)

� Window size

� Step size

� Training window width

� Testing window width

[email protected]

Figure 4: converting time series to windowed data

[email protected]

• Normal rectangular windowing

• Single attribute close is

selected, window size is 3,

horizon size is 1

• So, close-0/close-1/close-2

windowed attributes are

generated.

• label = (WS+Hz) th value

= (3+1)=4th value

4

[email protected]

* Since horizon = 1, so close-1 all 0 and close-

0 deleted.

* (old close 2 – old close 1) = new close-2.

* New close-0 = old close-0 – old close-1

(since horizon-1).

* Since close-0 (old) selected as base_value, so

base value=oldclose-0 – newclose-0

• First, it removes all attributes lying between

the time point zero (attribute name ending "-

0") and the time point before horizon values.

• Second, it transforms the corresponding time

point zero of the specified label stem to the

actual label.

• Last, it re-represents all values relative to the

last known time value for each original

dimension including the label value.

• Flatten windowing

[email protected]

• De-flatten windowing

• It adds the values of the base value

special attribute to both the label and

the predicted label (if available) so the

original data (range) is restored.

• After that, it removes the base value

special attribute.

* label + close 0= label_original

(since close 0 = base_value)

* close-0 is removed, since it was

selected as base value.

[email protected]

Original Time

series data

Normal

Windowed data

set

Flatten Windowed data set De-flatten Windowed data set

[email protected]

Some recent research works1. “Stock Forecasting Using Support Vector Machine,”• Authors: Lucas, K. C. Lai, James, N. K. Liu

• Applied technique: SVM and NN

• Data preprocess technique: Exponential Moving Average (EMA15) and relative difference in percentage of price (RDP)

• Domain: Hong Kong Stock Exchange

2. “Stock Index Prediction: A Comparison of MARS, BPN and SVR in an Emerging Market,”• Authors: Lu, Chi-Jie, Chang, Chih-Hsiang, Chen, Chien-Yu, Chiu, Chih-Chou, Lee, Tian-Shyug,

• Applied technique: Multivariate adaptive regression splines (MARS), Back propagation neural network (BPN), support vector regression (SVR), and multiple linear regression (MLR).

• Domain: Shanghai B-share stock index

3. “An Improved Support Vector Regression Modeling for Taiwan Stock Exchange Market Weighted Index Forecasting,”

• Authors: Kuan-Yu. Chen, Chia-Hui. Ho

• Applied technique: SVR, GA, Auto regression (AR)

• Domain: Taiwan Stock Exchange

5

[email protected]

Methodology

Data collection

� Experiment dataset had been collected from Dhaka stock exchange (DSE),

Bangladesh.

� 4 year’s (January 2009-June 2012)historical data were collected.

� Almost 522 company are listed in DSE. But for the convenient of the experiment we

only select one well known company data.

� Dataset had 6 attributes. Date, open price, high price, low price, close price, volumes.

� 5 attributes were used in experiment except volumes.

� Total 822 days data. 700 data were used as training dataset, and 122 data were used

as testing dataset.

[email protected]

76,950.00339.8335354.9350January 25, 2009ATLASBANG

9,100.00220.5215.5225215.5January 25, 2009ASIAPACINS

3,950.00268.6267.1270.1267.2January 25, 2009ARAMIT

37,690.001051.5104210831060January 25, 2009APEXTANRY

380.00484.25472500472January 25, 2009APEXSPINN

4,645.00885874907881.25January 25, 2009APEXFOODS

7,220.002200.25218522502220January 25, 2009APEXADELFT

6,960.001181114512031150January 25, 2009AMCL(PRAN)

50,800.00138.3130139.9132January 25, 2009AMBEEPHA

22,750.00410.75410419.25419January 25, 2009ALARABANK

2,417,500.0014.8914.8215.1515.05January 25, 2009AIMS1STMF

264,500.0063.36366.165January 25, 2009AGNISYSL

155,090.00442.25418.5448418.5January 25, 2009AFTABAUTO

289,100.00533.1514538.5518January 25, 2009ACI

34,740.00734.25732.5759.75759.75January 25, 2009ABBANK

250.00498.5495504495January 25, 20098THICB

250.00702.75680710710January 25, 20097THICB

2,840.00524520544543.75January 25, 20096THICB

60.001001.599910061005January 25, 20095THICB

60.001031.5101010401040January 25, 20094THICB

15.001781.5178117821782January 25, 20092NDICB

10.005100510051005100January 25, 20091STICB

2,600.00781.25767790776January 25, 20091STBSRS

VolumeCloseLowHighOpenDateCompany

[email protected]

469.1463477.9475February 25, 2009ACI

479.2476.2488480February 23, 2009ACI

478.5472.2484.9479February 22, 2009ACI

483.5478502499.9February 19, 2009ACI

494483500484February 18, 2009ACI

477.1472.1485473February 17, 2009ACI

472.3470479.5476February 16, 2009ACI

475.3474490485February 15, 2009ACI

481.5469485477February 12, 2009ACI

463.8460488488February 11, 2009ACI

481.4463485.9472.1February 10, 2009ACI

475.6471.1491489February 9, 2009ACI

489.3485508508February 8, 2009ACI

493.5490.4500.9495.2February 5, 2009ACI

495.2492.1505.9504February 4, 2009ACI

493.3489.1500500February 3, 2009ACI

498.5497512.9511February 2, 2009ACI

505.3504512504.2February 1, 2009ACI

508.7506.2521.9515.5January 29, 2009ACI

514.5509519.9509January 28, 2009ACI

508.8507.1519515January 27, 2009ACI

516.5515541538.5January 26, 2009ACI

533.1514538.5518January 25, 2009ACI

CloseLowHighOpenDateCompany

[email protected]

Training phase

Step 1: Read the training dataset from local

repository.

Step 2: Apply windowing operator to transform the

time series data into a generic dataset. This step will

convert the last row of a window within the time

series into a label or target variable. Last variable is

treated as label.

Step 3: Accomplish a cross validation process of the

produced label from windowing operator in order to

feed them as inputs into SVR model.

Step 4: Select kernel types and select special

parameters of SVR (C, epsilon (+/-), g etc).

Step 5: Run the model and observe the performance

(accuracy).

Step 6: If performance accuracy is good than go to

step 7, otherwise go to step 4.

Step 7: Exit from the training phase & apply trained

model to the testing dataset.

Testing phase

Step 1: Read the testing dataset from local

repository.

Step 2: Apply the training model to test the out of

sample dataset for price prediction.

Step 3: Produce the predicted trends and stock price

Training phase

Testing phase

Data pre-process

(windowing)

Machine learning

(SVR)

6

[email protected] [email protected]

Results

303015All

De-Flatten

window

303012522 days

3030185 days

3030131 day

Flatten window

303013AllRectangular

Testing

window

Training

window

Step sizeWindow

size

ModelWindowing

Name

Data pre-process & optimized input selection

SVR Kernel function parameter settings

112110000RBF

22 Days a-

head

112110000RBF5 Days a-head

112110000RBF1 Day a-head

ε-ε+εgCKernelSVR Model

[email protected]

-5873.59310.53437.0-9781.613218.63437.0-6944.410381.43437.0Jun'12

-4574.58890.24315.7-4219.88535.54315.7-5211.49527.14315.7May'12

-2793.08072.55279.5-3552.88832.35279.5-5051.410330.95279.5Apr'12

-6173.610189.44015.8-4690.18705.94015.8-3762.97778.74015.8Mar'12

2525.02981.45506.4-9516.012826.93310.9-6445.29756.13310.9Feb'12

----------2169.54721.22551.7-3270.06680.73410.7Jan'12

De-Flatten

-980.04417.03437.0-47.02651.12604.1-10.83447.83437.0Jun'12

-362.54321.13958.6-285.34601.04315.7-101.34417.04315.7May'12

866.14413.45279.5140.15139.45279.537.35242.25279.5Apr'12

533.33482.54015.8116.23899.64015.839.13976.74015.8Mar'12

-518.63494.82976.2-8.83319.73310.94.43306.53310.9Feb'12

----------138.73112.32973.6-37.03883.73846.7Jan'12

Flatten

-863.04300.03437.0-567.14004.13437.0-1076.24513.23437.0Jun'12

-189.74505.44315.7-447.44763.14315.7-297.64613.34315.7May'12

1025.94253.65279.5224.45055.15279.597.45182.15279.5Apr'12

735.23280.64015.820.93994.94015.8-93.64109.44015.8Mar'12

-411.33387.52976.2-539.03849.93310.9-941.44252.33310.9Feb'12

----------142.23115.82973.6-78.23924.93846.7Jan'12

Rectangular

ErrorPredictedActualErrorPredictedActualErrorPredictedActual

22 days a-head5 days a-head1 day a-head

Month

[email protected]

1 Day a-head model's results for DSE using Flatten window

0

50

100

150

200

250

300

1/1

/2012

1/1

5/2

012

1/2

9/2

012

2/1

2/2

012

2/2

6/2

012

3/1

1/2

012

3/2

5/2

012

4/8

/2012

4/2

2/2

012

5/6

/2012

5/2

0/2

012

6/3

/2012

6/1

7/2

012

Days

Clo

se P

ric

e (

BD

T)

Actual Close (A) Predicted close (P)

5 Days a-head model's results for DSE using Flatten window

0

50100

150

200250

300

1/1

/201

2

1/1

5/2

01

2

1/2

9/2

01

2

2/1

2/2

01

2

2/2

6/2

01

2

3/1

1/2

01

2

3/2

5/2

01

2

4/8

/201

2

4/2

2/2

01

2

5/6

/201

2

5/2

0/2

01

2

6/3

/201

2

6/1

7/2

01

2

Days

Clo

se P

rice (

BD

T)


22 days a-head model's result for DSE using Flatten window

0

50

100

150

200

250

300

1/1/

2012

1/15

/201

21/

29/2

012

2/12

/201

22/

26/2

012

3/11

/201

23/

25/2

012

4/8/

2012

4/22

/201

25/

6/201

25/

20/2

012

Days

Clo

se P

rice

(B

DT

)


7

[email protected]

Result evolution technique:� Error calculation: Used MAPE

� MAPE: Mean Average Percentage Error (MAPE) was used to calculate the error rate between actual and predicted price.

1100

n

i

A P

AM APE

n

=

−

= ×

∑

Here,

A = Actual price

P = Predicted price

n = number of data to be counted5.400.740.822222 days a-head

8.280.190.4855 days a-head

6.840.080.6511 Day a-head

De-

flatten

window

Flatten

window

Rectangular

windowHorizonModel

[email protected]

MAPE for Rectangular window

0

0.5

1

1.5

2

Jan Feb Mar Apr May June

Month

MA

PE

(e

rro

r)

1 day a-head 5 days a-head 22 days a-head

MAPE for Flatten window

0

0.2

0.4

0.6

0.8

1

1.2


Month

MA

PE

(err

or)


MAPE for De-Flatten window

0

5

10

15

20


Month

MA

PE

(err

or)


[email protected]

• Compare with other Index data

6.195.402.4522 days a-head

6.518.283.865 days a-head

5.846.843.991 day a-head

De-Flatten

0.210.740.1422 days a-head

0.470.190.035 days a-head

0.010.080.011 day a-head

Flatten

3.220.821.4322 days a-head

0.570.480.745 days a-head

0.020.650.651 day a-head

Rectangular

IBMDSES&P 500

Index NameModelWindow Type

** S&P 500 and IBM index data were collected from: Google Finance.“http://www.google.com/finance”

[email protected]

Conclusion

Discussions :� Different windowing function can produce different prediction results.

� In this study 3 types of windowing operators are used. Normal rectangular window, Flatten window, De-flatten window.

� Rectangular and flatten windows are able to produce good prediction result for time series data.

� De-flatten window can not produce good prediction results.

Future works:

� Apply other windowing operators.

� Compare the model results with other machine learning techniques.

8

[email protected]

• Publication1) P.Meesad, R.I.Rasel. “ Dhaka Stock Exchange Trend Analysis Using Support Vector

Regression.” In: Advances In Intelligent System and Computing 209 (Springer), 2013, pp:135-143.

2) Phayung Meesad, Risul Islam Rasel. “ Stock Market Price Prediction Using Support Vector Regression.” In: 2nd International Conference on Informatics, Electronic and Vision (ICIEV’2013), Indexed in IEEE explore, pp:1-6.

• Presentation1) 9th International Conference in Communication and Information Technology

(IC2IT’2013)

2) 2nd International Conference on Informatics, Electronic and Vision (ICIEV’2013).

[email protected]

THANK YOU