predicting stock market price using support vector regression
TRANSCRIPT
1
Predicting Stock Market Price Using Support Vector Machine with Different Kinds of Windows
By:Name: Risul Islam RaselStudent ID: 5407011866512Major Field: I-MIT
Master’s Thesis Presentation
Title:
Advisor:Assoc. Prof. Dr. Phayung Meesad
Agenda• Introduction
- Purpose of the study- Scope of the study
• Literature review- Time series Prediction- SVR- Windowing- Some recent research work
• Experiment design- Data collection- Data preprocess- Work flow diagram- Model Tree structure
• Results- windowing parameter values- SVR kernel function parameter value- model result analysis- Error calculation
• Conclusion
Introduction
• Stock exchange :- is an emerging business sector which becomes more popular among the
people.
- many people, organizations are related to this business.
- gaining insight about the market trend is become an important factor
• Stock trend or price prediction is regarding as challenging task because- Essentially a non-liner, non parametric
- Noisy
- Deterministically chaotic system
• Why deterministically chaotic system?- Liquid money and Stock adequacy
- Human behavior and News related to the stock market
- Share Gambling
- Money exchange rate, etc.
Purpose of the study
• To propose a stock market time series prediction model combining support
vector machine regression (SVR) and windowing operator.
• To apply the propose model to different stock market historical data set.
• To evaluate the model’s prediction results with real time data set from
stock markets in order to measure the prediction accuracy.
2
Scope of the study
• To develop a model which can rise early warning for financial crisis in
stock market and as well as to gain insight about the current trend of the
market.
• Propose a model which can be applied to different stock index in order to
predict stock prices. For this study, data is collected from the Dhaka stock
exchange (DSE), Bangladesh such that research result can be compared
and evaluated. 4 years (2009-2012) of historical data sets are collected and
separated into two groups, training data set (2009-2011) and testing data
set (2012).
• To compare the prediction results with different stock market time series
data from different stock index in order to evaluate the performance.
Literature review
Time Series Prediction
• A time series is a sequence of data points, measured typically at successive
points in time space at uniform time intervals.
• Examples of time series are the daily closing values of the stock index, daily
exchange rate, daily rainfall, flow volume of river etc.
• A time series analysis consists of two steps:
(1) Building a model that represents a time series, and
(2) Using the model to predict (forecast) future values.
• Forecasting systems are usually fed by some time series members of the last
several days whereas the next day closing price is obtained at the system
output, i.e.
Close [t-n], Close [t-n+1] ,…., Close [t-1], Close [t] � Close [t+1]
0123456789
1/1/
2012
1/2/
2012
1/3/
2012
1/4/
2012
1/5/
2012
1/6/
2012
1/7/
2012
1/8/
2012
1/9/
2012
Time (Days)
Va
lue (
Pri
ce) open
high
low
close
Figure 2: Time Series Prediction Process
Figure 1: Stock Market Time Series
Support Vector Machine Regression
� Support vector machine (SVM), a novel artificial intelligence-based method developed from statistical learning theory
� SVM has two major futures: classification (SVC) & regression (SVR).
� In SVM regression, the input is first mapped onto a m-dimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space.
� a margin of tolerance (epsilon) is set in approximation.
� This type of function is often called – epsilon intensive – loss function.
� Usage of slack variables to overcome noise in the data and non – separability.
3
Figure 3: Linear and Nonlinear SVR
Windowing operator:
� transform the time series data into
a generic data set
� convert the last row of a window
within the time series into a label
or target variable
� Fed the cross sectional values as
inputs to the machine learning
technique such as liner regression,
Neural Network, Support vector
machine and so on.
• Parameters:
� Horizon (h)
� Window size
� Step size
� Training window width
� Testing window width
Figure 4: converting time series to windowed data
• Normal rectangular windowing
• Single attribute close is
selected, window size is 3,
horizon size is 1
• So, close-0/close-1/close-2
windowed attributes are
generated.
• label = (WS+Hz) th value
= (3+1)=4th value
4
* Since horizon = 1, so close-1 all 0 and close-
0 deleted.
* (old close 2 – old close 1) = new close-2.
* New close-0 = old close-0 – old close-1
(since horizon-1).
* Since close-0 (old) selected as base_value, so
base value=oldclose-0 – newclose-0
• First, it removes all attributes lying between
the time point zero (attribute name ending "-
0") and the time point before horizon values.
• Second, it transforms the corresponding time
point zero of the specified label stem to the
actual label.
• Last, it re-represents all values relative to the
last known time value for each original
dimension including the label value.
• Flatten windowing
• De-flatten windowing
• It adds the values of the base value
special attribute to both the label and
the predicted label (if available) so the
original data (range) is restored.
• After that, it removes the base value
special attribute.
* label + close 0= label_original
(since close 0 = base_value)
* close-0 is removed, since it was
selected as base value.
Original Time
series data
Normal
Windowed data
set
Flatten Windowed data set De-flatten Windowed data set
Some recent research works1. “Stock Forecasting Using Support Vector Machine,”• Authors: Lucas, K. C. Lai, James, N. K. Liu
• Applied technique: SVM and NN
• Data preprocess technique: Exponential Moving Average (EMA15) and relative difference in percentage of price (RDP)
• Domain: Hong Kong Stock Exchange
2. “Stock Index Prediction: A Comparison of MARS, BPN and SVR in an Emerging Market,”• Authors: Lu, Chi-Jie, Chang, Chih-Hsiang, Chen, Chien-Yu, Chiu, Chih-Chou, Lee, Tian-Shyug,
• Applied technique: Multivariate adaptive regression splines (MARS), Back propagation neural network (BPN), support vector regression (SVR), and multiple linear regression (MLR).
• Domain: Shanghai B-share stock index
3. “An Improved Support Vector Regression Modeling for Taiwan Stock Exchange Market Weighted Index Forecasting,”
• Authors: Kuan-Yu. Chen, Chia-Hui. Ho
• Applied technique: SVR, GA, Auto regression (AR)
• Domain: Taiwan Stock Exchange
5
Methodology
Data collection
� Experiment dataset had been collected from Dhaka stock exchange (DSE),
Bangladesh.
� 4 year’s (January 2009-June 2012)historical data were collected.
� Almost 522 company are listed in DSE. But for the convenient of the experiment we
only select one well known company data.
� Dataset had 6 attributes. Date, open price, high price, low price, close price, volumes.
� 5 attributes were used in experiment except volumes.
� Total 822 days data. 700 data were used as training dataset, and 122 data were used
as testing dataset.
76,950.00339.8335354.9350January 25, 2009ATLASBANG
9,100.00220.5215.5225215.5January 25, 2009ASIAPACINS
3,950.00268.6267.1270.1267.2January 25, 2009ARAMIT
37,690.001051.5104210831060January 25, 2009APEXTANRY
380.00484.25472500472January 25, 2009APEXSPINN
4,645.00885874907881.25January 25, 2009APEXFOODS
7,220.002200.25218522502220January 25, 2009APEXADELFT
6,960.001181114512031150January 25, 2009AMCL(PRAN)
50,800.00138.3130139.9132January 25, 2009AMBEEPHA
22,750.00410.75410419.25419January 25, 2009ALARABANK
2,417,500.0014.8914.8215.1515.05January 25, 2009AIMS1STMF
264,500.0063.36366.165January 25, 2009AGNISYSL
155,090.00442.25418.5448418.5January 25, 2009AFTABAUTO
289,100.00533.1514538.5518January 25, 2009ACI
34,740.00734.25732.5759.75759.75January 25, 2009ABBANK
250.00498.5495504495January 25, 20098THICB
250.00702.75680710710January 25, 20097THICB
2,840.00524520544543.75January 25, 20096THICB
60.001001.599910061005January 25, 20095THICB
60.001031.5101010401040January 25, 20094THICB
15.001781.5178117821782January 25, 20092NDICB
10.005100510051005100January 25, 20091STICB
2,600.00781.25767790776January 25, 20091STBSRS
VolumeCloseLowHighOpenDateCompany
469.1463477.9475February 25, 2009ACI
479.2476.2488480February 23, 2009ACI
478.5472.2484.9479February 22, 2009ACI
483.5478502499.9February 19, 2009ACI
494483500484February 18, 2009ACI
477.1472.1485473February 17, 2009ACI
472.3470479.5476February 16, 2009ACI
475.3474490485February 15, 2009ACI
481.5469485477February 12, 2009ACI
463.8460488488February 11, 2009ACI
481.4463485.9472.1February 10, 2009ACI
475.6471.1491489February 9, 2009ACI
489.3485508508February 8, 2009ACI
493.5490.4500.9495.2February 5, 2009ACI
495.2492.1505.9504February 4, 2009ACI
493.3489.1500500February 3, 2009ACI
498.5497512.9511February 2, 2009ACI
505.3504512504.2February 1, 2009ACI
508.7506.2521.9515.5January 29, 2009ACI
514.5509519.9509January 28, 2009ACI
508.8507.1519515January 27, 2009ACI
516.5515541538.5January 26, 2009ACI
533.1514538.5518January 25, 2009ACI
CloseLowHighOpenDateCompany
Training phase
Step 1: Read the training dataset from local
repository.
Step 2: Apply windowing operator to transform the
time series data into a generic dataset. This step will
convert the last row of a window within the time
series into a label or target variable. Last variable is
treated as label.
Step 3: Accomplish a cross validation process of the
produced label from windowing operator in order to
feed them as inputs into SVR model.
Step 4: Select kernel types and select special
parameters of SVR (C, epsilon (+/-), g etc).
Step 5: Run the model and observe the performance
(accuracy).
Step 6: If performance accuracy is good than go to
step 7, otherwise go to step 4.
Step 7: Exit from the training phase & apply trained
model to the testing dataset.
Testing phase
Step 1: Read the testing dataset from local
repository.
Step 2: Apply the training model to test the out of
sample dataset for price prediction.
Step 3: Produce the predicted trends and stock price
Training phase
Testing phase
Data pre-process
(windowing)
Machine learning
(SVR)
6
[email protected] [email protected]
Results
303015All
De-Flatten
window
303012522 days
3030185 days
3030131 day
Flatten window
303013AllRectangular
Testing
window
Training
window
Step sizeWindow
size
ModelWindowing
Name
Data pre-process & optimized input selection
SVR Kernel function parameter settings
112110000RBF
22 Days a-
head
112110000RBF5 Days a-head
112110000RBF1 Day a-head
ε-ε+εgCKernelSVR Model
-5873.59310.53437.0-9781.613218.63437.0-6944.410381.43437.0Jun'12
-4574.58890.24315.7-4219.88535.54315.7-5211.49527.14315.7May'12
-2793.08072.55279.5-3552.88832.35279.5-5051.410330.95279.5Apr'12
-6173.610189.44015.8-4690.18705.94015.8-3762.97778.74015.8Mar'12
2525.02981.45506.4-9516.012826.93310.9-6445.29756.13310.9Feb'12
----------2169.54721.22551.7-3270.06680.73410.7Jan'12
De-Flatten
-980.04417.03437.0-47.02651.12604.1-10.83447.83437.0Jun'12
-362.54321.13958.6-285.34601.04315.7-101.34417.04315.7May'12
866.14413.45279.5140.15139.45279.537.35242.25279.5Apr'12
533.33482.54015.8116.23899.64015.839.13976.74015.8Mar'12
-518.63494.82976.2-8.83319.73310.94.43306.53310.9Feb'12
----------138.73112.32973.6-37.03883.73846.7Jan'12
Flatten
-863.04300.03437.0-567.14004.13437.0-1076.24513.23437.0Jun'12
-189.74505.44315.7-447.44763.14315.7-297.64613.34315.7May'12
1025.94253.65279.5224.45055.15279.597.45182.15279.5Apr'12
735.23280.64015.820.93994.94015.8-93.64109.44015.8Mar'12
-411.33387.52976.2-539.03849.93310.9-941.44252.33310.9Feb'12
----------142.23115.82973.6-78.23924.93846.7Jan'12
Rectangular
ErrorPredictedActualErrorPredictedActualErrorPredictedActual
22 days a-head5 days a-head1 day a-head
Month
1 Day a-head model's results for DSE using Flatten window
0
50
100
150
200
250
300
1/1
/2012
1/1
5/2
012
1/2
9/2
012
2/1
2/2
012
2/2
6/2
012
3/1
1/2
012
3/2
5/2
012
4/8
/2012
4/2
2/2
012
5/6
/2012
5/2
0/2
012
6/3
/2012
6/1
7/2
012
Days
Clo
se P
ric
e (
BD
T)
Actual Close (A) Predicted close (P)
5 Days a-head model's results for DSE using Flatten window
0
50100
150
200250
300
1/1
/201
2
1/1
5/2
01
2
1/2
9/2
01
2
2/1
2/2
01
2
2/2
6/2
01
2
3/1
1/2
01
2
3/2
5/2
01
2
4/8
/201
2
4/2
2/2
01
2
5/6
/201
2
5/2
0/2
01
2
6/3
/201
2
6/1
7/2
01
2
Days
Clo
se P
rice (
BD
T)
Actual Close (A) Predicted close (P)
22 days a-head model's result for DSE using Flatten window
0
50
100
150
200
250
300
1/1/
2012
1/15
/201
21/
29/2
012
2/12
/201
22/
26/2
012
3/11
/201
23/
25/2
012
4/8/
2012
4/22
/201
25/
6/201
25/
20/2
012
Days
Clo
se P
rice
(B
DT
)
Actual Close (A) Predicted close (P)
7
Result evolution technique:� Error calculation: Used MAPE
� MAPE: Mean Average Percentage Error (MAPE) was used to calculate the error rate between actual and predicted price.
1100
n
i
A P
AM APE
n
=
−
= ×
∑
Here,
A = Actual price
P = Predicted price
n = number of data to be counted5.400.740.822222 days a-head
8.280.190.4855 days a-head
6.840.080.6511 Day a-head
De-
flatten
window
Flatten
window
Rectangular
windowHorizonModel
MAPE for Rectangular window
0
0.5
1
1.5
2
Jan Feb Mar Apr May June
Month
MA
PE
(e
rro
r)
1 day a-head 5 days a-head 22 days a-head
MAPE for Flatten window
0
0.2
0.4
0.6
0.8
1
1.2
Jan Feb Mar Apr May June
Month
MA
PE
(err
or)
1 day a-head 5 days a-head 22 days a-head
MAPE for De-Flatten window
0
5
10
15
20
Jan Feb Mar Apr May June
Month
MA
PE
(err
or)
1 day a-head 5 days a-head 22 days a-head
• Compare with other Index data
6.195.402.4522 days a-head
6.518.283.865 days a-head
5.846.843.991 day a-head
De-Flatten
0.210.740.1422 days a-head
0.470.190.035 days a-head
0.010.080.011 day a-head
Flatten
3.220.821.4322 days a-head
0.570.480.745 days a-head
0.020.650.651 day a-head
Rectangular
IBMDSES&P 500
Index NameModelWindow Type
** S&P 500 and IBM index data were collected from: Google Finance.“http://www.google.com/finance”
Conclusion
Discussions :� Different windowing function can produce different prediction results.
� In this study 3 types of windowing operators are used. Normal rectangular window, Flatten window, De-flatten window.
� Rectangular and flatten windows are able to produce good prediction result for time series data.
� De-flatten window can not produce good prediction results.
Future works:
� Apply other windowing operators.
� Compare the model results with other machine learning techniques.
8
• Publication1) P.Meesad, R.I.Rasel. “ Dhaka Stock Exchange Trend Analysis Using Support Vector
Regression.” In: Advances In Intelligent System and Computing 209 (Springer), 2013, pp:135-143.
2) Phayung Meesad, Risul Islam Rasel. “ Stock Market Price Prediction Using Support Vector Regression.” In: 2nd International Conference on Informatics, Electronic and Vision (ICIEV’2013), Indexed in IEEE explore, pp:1-6.
• Presentation1) 9th International Conference in Communication and Information Technology
(IC2IT’2013)
2) 2nd International Conference on Informatics, Electronic and Vision (ICIEV’2013).
THANK YOU