Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques

Download Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques

Post on 20-Feb-2017

230 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>1</p><p>3</p><p>4</p><p>5</p><p>6</p><p>7 Q1</p><p>8</p><p>910</p><p>1 2</p><p>1314</p><p>15161718192021</p><p>2 2</p><p>39</p><p>40</p><p>41</p><p>42</p><p>43</p><p>44</p><p>45</p><p>46</p><p>47</p><p>48</p><p>49</p><p>50</p><p>51</p><p>52</p><p>53</p><p>54</p><p>55</p><p>56</p><p>57</p><p>58</p><p>59</p><p>Q2</p><p>Expert Systems with Applications xxx (2014) xxxxxx</p><p>ESWA 9461 No. of Pages 10, Model 5G</p><p>8 August 2014</p><p>Q1Contents lists available at ScienceDirect</p><p>Expert Systems with Applications</p><p>journal homepage: www.elsevier .com/locate /eswaPredicting stock and stock price index movement using TrendDeterministic Data Preparation and machine learning techniqueshttp://dx.doi.org/10.1016/j.eswa.2014.07.0400957-4174/ 2014 Published by Elsevier Ltd.</p><p> Corresponding author.</p><p>Please cite this article in press as: Patel, J., et al. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and mlearning techniques. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.07.040Jigar Patel, Sahil Shah, Priyank Thakkar , K. KotechaComputer Science &amp; Engineering Department, Institute of Technology, Nirma University, Ahmedabad, Gujarat, India</p><p>a r t i c l e i n f o a b s t r a c t232425262728293031Article history:Available online xxxx</p><p>Keywords:Naive-Bayes classificationArtificial Neural NetworksSupport vector machineRandom forestStock market323334353637This paper addresses problem of predicting direction of movement of stock and stock price index forIndian stock markets. The study compares four prediction models, Artificial Neural Network (ANN), sup-port vector machine (SVM), random forest and Naive-Bayes with two approaches for input to these mod-els. The first approach for input data involves computation of ten technical parameters using stocktrading data (open, high, low &amp; close prices) while the second approach focuses on representing thesetechnical parameters as trend deterministic data. Accuracy of each of the prediction models for each ofthe two input approaches is evaluated. Evaluation is carried out on 10 years of historical data from2003 to 2012 of two stocks namely Reliance Industries and Infosys Ltd. and two stock price indicesCNX Nifty and S&amp;P Bombay Stock Exchange (BSE) Sensex. The experimental results suggest that for thefirst approach of input data where ten technical parameters are represented as continuous values, ran-dom forest outperforms other three prediction models on overall performance. Experimental results alsoshow that the performance of all the prediction models improve when these technical parameters arerepresented as trend deterministic data.</p><p> 2014 Published by Elsevier Ltd.3860</p><p>61</p><p>62</p><p>63</p><p>64</p><p>65</p><p>66</p><p>67</p><p>68</p><p>69</p><p>70</p><p>71</p><p>72</p><p>73</p><p>74</p><p>75</p><p>76</p><p>77</p><p>78</p><p>79</p><p>801. Introduction</p><p>Predicting stock and stock price index is difficult due to uncer-tainties involved. There are two types of analysis which investorsperform before investing in a stock. First is the fundamental anal-ysis. In this, investors looks at intrinsic value of stocks, perfor-mance of the industry and economy, political climate etc. todecide whether to invest or not. On the other hand, technical anal-ysis is the evaluation of stocks by means of studying statistics gen-erated by market activity, such as past prices and volumes.Technical analysts do not attempt to measure a security s intrinsicvalue but instead use stock charts to identify patterns and trendsthat may suggest how a stock will behave in the future. Efficientmarket hypothesis by Malkiel and Fama (1970) states that pricesof stocks are informationally efficient which means that it is possi-ble to predict stock prices based on the trading data. This is quitelogical as many uncertain factors like political scenario of country,public image of the company will start reflecting in the stockprices. So if the information obtained from stock prices is pre-pro-cessed efficiently and appropriate algorithms are applied thentrend of stock or stock price index may be predicted.81</p><p>82</p><p>83Since years, many techniques have been developed to predictstock trends. Initially classical regression methods were used topredict stock trends. Since stock data can be categorized as non-stationary time series data, non-linear machine learning tech-niques have also been used. Artificial Neural Networks (ANN)and Support Vector Machine (SVM) are two machine learning algo-rithms which are most widely used for predicting stock and stockprice index movement. Each algorithm has its own way to learnpatterns. ANN emulates functioning of our brain to learn by creat-ing network of neurons. Hassan, Nath, and Kirley (2007) proposedand implemented a fusion model by combining the Hidden MarkovModel (HMM), Artificial Neural Networks (ANN) and Genetic Algo-rithms (GA) to forecast financial market behavior. Using ANN, thedaily stock prices are transformed to independent sets of valuesthat become input to HMM. Wang and Leu (1996) developed a pre-diction system useful in forecasting mid-term price trend in Tai-wan stock market. Their system was based on a recurrent neuralnetwork trained by using features extracted from ARIMA analyses.Empirical results showed that the networks trained using 4-yearweekly data was capable of predicting up to 6 weeks market trendwith acceptable accuracy. Hybridized soft computing techniquesfor automated stock market forecasting and trend analysis wasintroduced by Abraham, Nath, and Mahanti (2001). They used Nas-daq-100 index of Nasdaq stock market with neural network for oneachine</p><p>http://dx.doi.org/10.1016/j.eswa.2014.07.040http://dx.doi.org/10.1016/j.eswa.2014.07.040http://www.sciencedirect.com/science/journal/09574174http://www.elsevier.com/locate/eswahttp://dx.doi.org/10.1016/j.eswa.2014.07.040</p></li><li><p>84</p><p>85</p><p>86</p><p>87</p><p>88</p><p>89</p><p>90</p><p>91</p><p>92</p><p>93</p><p>94</p><p>95</p><p>96</p><p>97</p><p>98</p><p>99</p><p>100</p><p>101</p><p>102</p><p>103</p><p>104</p><p>105</p><p>106</p><p>107</p><p>108</p><p>109</p><p>110</p><p>111</p><p>112</p><p>113</p><p>114</p><p>115</p><p>116</p><p>117</p><p>118</p><p>119</p><p>120</p><p>121</p><p>122</p><p>123</p><p>124</p><p>125</p><p>126</p><p>127</p><p>128</p><p>129</p><p>130</p><p>131</p><p>132</p><p>133</p><p>134</p><p>135</p><p>136</p><p>137</p><p>138</p><p>139</p><p>140</p><p>141</p><p>142</p><p>143</p><p>144</p><p>145</p><p>146</p><p>147</p><p>148</p><p>149</p><p>150</p><p>151</p><p>152</p><p>153</p><p>154</p><p>155</p><p>156</p><p>157</p><p>158</p><p>159</p><p>160</p><p>161</p><p>162</p><p>163</p><p>164</p><p>165</p><p>166</p><p>167</p><p>168</p><p>169</p><p>170</p><p>171</p><p>172</p><p>173</p><p>174</p><p>175</p><p>176</p><p>177</p><p>178</p><p>179</p><p>180</p><p>181</p><p>182</p><p>183</p><p>184</p><p>185</p><p>186</p><p>187</p><p>188</p><p>189</p><p>190</p><p>191</p><p>192</p><p>193</p><p>194</p><p>195</p><p>196</p><p>197</p><p>198</p><p>199</p><p>200</p><p>201</p><p>202</p><p>203</p><p>204</p><p>205</p><p>206</p><p>207</p><p>208</p><p>209</p><p>210</p><p>211</p><p>212</p><p>213</p><p>214</p><p>215</p><p>2 J. PatelQ1 et al. / Expert Systems with Applications xxx (2014) xxxxxx</p><p>ESWA 9461 No. of Pages 10, Model 5G</p><p>8 August 2014</p><p>Q1day ahead stock forecasting and a neuro-fuzzy system for analys-ing the trend of the predicted stock values. The forecasting andtrend prediction results using the proposed hybrid system werepromising. Chen, Leung, and Daouk (2003) investigated the proba-bilistic neural network (PNN) to forecast the direction of indexafter it was trained by historical data. Empirical results showedthat the PNN-based investment strategies obtained higher returnsthan other investment strategies examined in the study like thebuy-and-hold strategy as well as the investment strategies guidedby forecasts estimated by the random walk model and the para-metric GMM models.</p><p>A very well-known SVM algorithm developed by Vapnik (1999)searches for a hyper plane in higher dimension to separate classes.Support vector machine (SVM) is a very specific type of learningalgorithms characterized by the capacity control of the decisionfunction, the use of the kernel functions and the scarcity of thesolution. Huang, Nakamori, and Wang (2005) investigated the pre-dictability of financial movement direction with SVM by forecast-ing the weekly movement direction of NIKKEI 225 index. Theycompared SVM with Linear Discriminant Analysis, Quadratic Dis-criminant Analysis and Elman Backpropagation Neural Networks.The experiment results showed that SVM outperformed the otherclassification methods. SVM was used by Kim (2003) to predictthe direction of daily stock price change in the Korea compositestock price index (KOSPI). Twelve technical indicators wereselected to make up the initial attributes. This study comparedSVM with back-propagation neural network (BPN) and case-basedreasoning (CBR). It was evident from the experimental results thatSVM outperformed BPN and CBR.</p><p>Random forest creates n classification trees using sample withreplacement and predicts class based on what majority of treespredict. The trained ensemble, therefore, represents a singlehypothesis. This hypothesis, however, is not necessarily containedwithin the hypothesis space of the models from which it is built.Thus, ensembles can be shown to have more flexibility in the func-tions they can represent. This flexibility can, in theory, enable themto over-fit the training data more than a single model would, but inpractice, some ensemble techniques (especially bagging) tend toreduce problems related to over-fitting of the training data. Tsai,Lin, Yen, and Chen (2011) investigated the prediction performancethat utilizes the classifier ensembles method to analyze stockreturns. The hybrid methods of majority voting and bagging wereconsidered. Moreover, performance using two types of classifierensembles were compared with those using single baseline classi-fiers (i.e. neural networks, decision trees, and logistic regression).The results indicated that multiple classifiers outperform singleclassifiers in terms of prediction accuracy and returns on invest-ment. Sun and Li (2012) proposed new Financial distress prediction(FDP) method based on SVM ensemble. The algorithm for selectingSVM ensemble s base classifiers from candidate ones was designedby considering both individual performance and diversity analysis.Experimental results indicated that SVM ensemble was signifi-cantly superior to individual SVM classifier. Ou and Wang (2009)used total ten data mining techniques to predict price movementof Hang Seng index of Hong Kong stock market. The approachesinclude Linear discriminant analysis (LDA), Quadratic discriminantanalysis (QDA), K-nearest neighbor classification, Naive Bayesbased on kernel estimation, Logit model, Tree based classification,neural network, Bayesian classification with Gaussian process,Support vector machine (SVM) and Least squares support vectormachine (LS-SVM). Experimental results showed that the SVMand LS-SVM generated superior predictive performance amongthe other models.</p><p>It is evident from the above discussions that each of the algo-rithms in its own way can tackle this problem. It is also to benoticed that each of the algorithm has its own limitations. The finalPlease cite this article in press as: Patel, J., et al. Predicting stock and stock pricelearning techniques. Expert Systems with Applications (2014), http://dx.doi.org/prediction outcome not only depends on the prediction algorithmused but is also influenced by the representation of the input. Iden-tifying important features and using only them as the input ratherthan all the features may improve the prediction accuracy of theprediction models. A two-stage architecture was developed byHsu, Hsieh, Chih, and Hsu (2009). They integrated self-organizingmap and support vector regression for stock price prediction. Theyexamined seven major stock market indices. Specifically, the self-organizing map (SOM) was first used to decompose the wholeinput space into regions where data points with similar statisticaldistributions were grouped together, so as to contain and capturethe non-stationary property of financial series. After decomposingheterogeneous data points into several homogenous regions, sup-port vector regression (SVR) was applied to forecast financial indi-ces. The results suggested that the two stage architecture provideda promising alternative for stock price prediction. Genetic pro-gramming (GP) and its variants have been extensively applied formodeling of the stock markets. To improve the generalization abil-ity of the model, GP have been hybridized with its own variants(gene expression programming (GEP), multi expression program-ming (MEP)) or with the other methods such as neural networksand boosting. The generalization ability of the GP model can alsobe improved by an appropriate choice of model selection criterion.Garg, Sriram, and Tai (2013) worked to analyse the effect of threemodel selection criteria across two data transformations on theperformance of GP while modeling the stock indexed in the NewYork Stock Exchange (NYSE). It was found that FPE criteria haveshown a better fit for the GP model on both data transformationsas compared to other model selection criteria. Nair et al. (2011)predicted the next day s closing value of five international stockindices using an adaptive Artificial Neural Network based system.The system adapted itself to the changing market dynamics withthe help of genetic algorithm which tunes the parameters of theneural network at the end of each trading session. The study byAhmed (2008) investigated the nature of the causal relationshipsbetween stock prices and the key macro-economic variables repre-senting real and financial sector of the Indian economy for the per-iod March, 19952007 using quarterly data. The study revealedthat the movement of stock prices was not only the outcome ofbehavior of key macro-economic variables but it was also one ofthe causes of movement in other macro dimension in the economy.Mantri, Gahan, and Nayak (2010) calculated the volatilities ofIndian stock markets using GARCH, EGARCH, GJR-GARCH, IGARCH&amp; ANN models. This study used Fourteen years of data of BSE Sen-sex &amp; NSE Nifty to calculate the volatilities. It was concluded thatthere is no difference in the volatilities of Sensex, &amp; Nifty estimatedunder the GARCH, EGARCH, GJR GARCH, IGARCH &amp; ANN models.Mishra, Sehgal, and Bhanumurthy (2011) tested for the presenceof nonlinear dependence and deterministic chaos in the rate ofreturns series for six Indian stock market indices. The result ofanalysis suggested that the returns series did not follow a randomwalk process. Rather it appeared that the daily increments in stockreturns were serially correlated and the estimated Hurst exponentswere indicative of marginal persistence in equity returns. Liu andWang (2012) investigated and forecast the price fluctuation byan improved Legendre neural network by assuming that the inves-tors decided their investing positions by analysing the historicaldata on the stock market. They also introduced a random timestrength function in the forecasting model. The MorphologicalRank Linear Forecasting (EMRLF) method was proposed byArajo and Ferreira (2013). An experimental analysis was con-ducted and the results were compared to Multilayer Perceptron(MLP) networks and Time-delay Added Evolutionary Forecasting(TAEF) method.</p><p>This study focuses on comparing prediction performance ofANN, SVM, random forest and Naive-Bayes algorithms for the taskindex movement using Trend Deterministic Data Preparation and machine10.1016/j.eswa.2014.07.040</p><p>http://dx.doi.org/10.1016/j.eswa.2014.07.040</p></li><li><p>216</p><p>217</p><p>218</p><p>219</p><p>220</p><p>221</p><p>222</p><p>223</p><p>224</p><p>225</p><p>226</p><p>227</p><p>228</p><p>229</p><p>230</p><p>231</p><p>232</p><p>233</p><p>234</p><p>235</p><p>236</p><p>237</p><p>238</p><p>239</p><p>240</p><p>241</p><p>242</p><p>243</p><p>244</p><p>245</p><p>246</p><p>247</p><p>248</p><p>249</p><p>250</p><p>251</p><p>252</p><p>253</p><p>254</p><p>255</p><p>256</p><p>257</p><p>258</p><p>259</p><p>260</p><p>261</p><p>262</p><p>263</p><p>264</p><p>265</p><p>Table 2The number of increase and decrease cases in each year in the parameter setting dataset...</p></li></ul>

Recommended

View more >