predicting stock and stock price index movement using trend deterministic data preparation and...
Post on 20-Feb-2017
Embed Size (px)
Expert Systems with Applications xxx (2014) xxxxxx
ESWA 9461 No. of Pages 10, Model 5G
8 August 2014
Q1Contents lists available at ScienceDirect
Expert Systems with Applications
journal homepage: www.elsevier .com/locate /eswaPredicting stock and stock price index movement using TrendDeterministic Data Preparation and machine learning techniqueshttp://dx.doi.org/10.1016/j.eswa.2014.07.0400957-4174/ 2014 Published by Elsevier Ltd.
Please cite this article in press as: Patel, J., et al. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and mlearning techniques. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.07.040Jigar Patel, Sahil Shah, Priyank Thakkar , K. KotechaComputer Science & Engineering Department, Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
a r t i c l e i n f o a b s t r a c t232425262728293031Article history:Available online xxxx
Keywords:Naive-Bayes classificationArtificial Neural NetworksSupport vector machineRandom forestStock market323334353637This paper addresses problem of predicting direction of movement of stock and stock price index forIndian stock markets. The study compares four prediction models, Artificial Neural Network (ANN), sup-port vector machine (SVM), random forest and Naive-Bayes with two approaches for input to these mod-els. The first approach for input data involves computation of ten technical parameters using stocktrading data (open, high, low & close prices) while the second approach focuses on representing thesetechnical parameters as trend deterministic data. Accuracy of each of the prediction models for each ofthe two input approaches is evaluated. Evaluation is carried out on 10 years of historical data from2003 to 2012 of two stocks namely Reliance Industries and Infosys Ltd. and two stock price indicesCNX Nifty and S&P Bombay Stock Exchange (BSE) Sensex. The experimental results suggest that for thefirst approach of input data where ten technical parameters are represented as continuous values, ran-dom forest outperforms other three prediction models on overall performance. Experimental results alsoshow that the performance of all the prediction models improve when these technical parameters arerepresented as trend deterministic data.
2014 Published by Elsevier Ltd.3860
Predicting stock and stock price index is difficult due to uncer-tainties involved. There are two types of analysis which investorsperform before investing in a stock. First is the fundamental anal-ysis. In this, investors looks at intrinsic value of stocks, perfor-mance of the industry and economy, political climate etc. todecide whether to invest or not. On the other hand, technical anal-ysis is the evaluation of stocks by means of studying statistics gen-erated by market activity, such as past prices and volumes.Technical analysts do not attempt to measure a security s intrinsicvalue but instead use stock charts to identify patterns and trendsthat may suggest how a stock will behave in the future. Efficientmarket hypothesis by Malkiel and Fama (1970) states that pricesof stocks are informationally efficient which means that it is possi-ble to predict stock prices based on the trading data. This is quitelogical as many uncertain factors like political scenario of country,public image of the company will start reflecting in the stockprices. So if the information obtained from stock prices is pre-pro-cessed efficiently and appropriate algorithms are applied thentrend of stock or stock price index may be predicted.81
83Since years, many techniques have been developed to predictstock trends. Initially classical regression methods were used topredict stock trends. Since stock data can be categorized as non-stationary time series data, non-linear machine learning tech-niques have also been used. Artificial Neural Networks (ANN)and Support Vector Machine (SVM) are two machine learning algo-rithms which are most widely used for predicting stock and stockprice index movement. Each algorithm has its own way to learnpatterns. ANN emulates functioning of our brain to learn by creat-ing network of neurons. Hassan, Nath, and Kirley (2007) proposedand implemented a fusion model by combining the Hidden MarkovModel (HMM), Artificial Neural Networks (ANN) and Genetic Algo-rithms (GA) to forecast financial market behavior. Using ANN, thedaily stock prices are transformed to independent sets of valuesthat become input to HMM. Wang and Leu (1996) developed a pre-diction system useful in forecasting mid-term price trend in Tai-wan stock market. Their system was based on a recurrent neuralnetwork trained by using features extracted from ARIMA analyses.Empirical results showed that the networks trained using 4-yearweekly data was capable of predicting up to 6 weeks market trendwith acceptable accuracy. Hybridized soft computing techniquesfor automated stock market forecasting and trend analysis wasintroduced by Abraham, Nath, and Mahanti (2001). They used Nas-daq-100 index of Nasdaq stock market with neural network for oneachine
2 J. PatelQ1 et al. / Expert Systems with Applications xxx (2014) xxxxxx
ESWA 9461 No. of Pages 10, Model 5G
8 August 2014
Q1day ahead stock forecasting and a neuro-fuzzy system for analys-ing the trend of the predicted stock values. The forecasting andtrend prediction results using the proposed hybrid system werepromising. Chen, Leung, and Daouk (2003) investigated the proba-bilistic neural network (PNN) to forecast the direction of indexafter it was trained by historical data. Empirical results showedthat the PNN-based investment strategies obtained higher returnsthan other investment strategies examined in the study like thebuy-and-hold strategy as well as the investment strategies guidedby forecasts estimated by the random walk model and the para-metric GMM models.
A very well-known SVM algorithm developed by Vapnik (1999)searches for a hyper plane in higher dimension to separate classes.Support vector machine (SVM) is a very specific type of learningalgorithms characterized by the capacity control of the decisionfunction, the use of the kernel functions and the scarcity of thesolution. Huang, Nakamori, and Wang (2005) investigated the pre-dictability of financial movement direction with SVM by forecast-ing the weekly movement direction of NIKKEI 225 index. Theycompared SVM with Linear Discriminant Analysis, Quadratic Dis-criminant Analysis and Elman Backpropagation Neural Networks.The experiment results showed that SVM outperformed the otherclassification methods. SVM was used by Kim (2003) to predictthe direction of daily stock price change in the Korea compositestock price index (KOSPI). Twelve technical indicators wereselected to make up the initial attributes. This study comparedSVM with back-propagation neural network (BPN) and case-basedreasoning (CBR). It was evident from the experimental results thatSVM outperformed BPN and CBR.
Random forest creates n classification trees using sample withreplacement and predicts class based on what majority of treespredict. The trained ensemble, therefore, represents a singlehypothesis. This hypothesis, however, is not necessarily containedwithin the hypothesis space of the models from which it is built.Thus, ensembles can be shown to have more flexibility in the func-tions they can represent. This flexibility can, in theory, enable themto over-fit the training data more than a single model would, but inpractice, some ensemble techniques (especially bagging) tend toreduce problems related to over-fitting of the training data. Tsai,Lin, Yen, and Chen (2011) investigated the prediction performancethat utilizes the classifier ensembles method to analyze stockreturns. The hybrid meth