technical analysis inspired machine learning for stock ...928205/fulltext01.pdf · examensarbete...

INOM EXAMENSARBETE TEKNIK,GRUNDNIVÅ, 15 HP

, STOCKHOLM SVERIGE 2016

Technical Analysis inspired Machine Learning for Stock Market Data

GUSTAV KIHLSTRÖM

PATRYK PRZYBYSZ

KTHSKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION

Teknisk analys-inspirerad maskininlarning for

aktiemarknadsdata

Gustav KihlstromPatryk Przybysz

Supervisor: Marten Bjorkman

Examiner: Orjan Ekeberg

11/05/2016

1

Abstract

In this thesis we evaluate four different machine learning algorithms,namely Naive Bayes Classifier, Support Vector Machines, Extreme Learn-ing Machine and Random Forest in the context of stock market invest-ments. The aim is to provide additional information that can be beneficialwhen creating stock market models to be used in a machine learning set-ting. All four algorithms are trained on different configurations of data,based on concepts from technical analysis. The configurations containclosing prices, volatility and trading volume in different combinations.These variables are taken from past trading days, where the number ofdays from which data is to be collected ranges from 2 to 30. The resultingpredictors attained from the various algorithms and configurations abovereach accuracy rates between 50−54%. This thesis concludes that the ef-fect of the different evaluated features vary depending on which algorithmis used as well as how many past trading days are included. Concluding, itis ascertained that the usage of volatility features should at least be con-sidered when building a machine learning model in a stock market context.

I den har rapporten utvarderar vi fyra olika maskininlarningsalgoritmer,namligen Naive Bayes klassifieraren, Support Vector Machines, ExtremeLearning Machines, och Random Forest, applicerat pa aktiemarknads-data. Malet ar att tillfora nya insikter som kan komma till anvandningvid skapandet av aktiemarknadsmodeller baserade pa maskininlarningsal-goritmer. Alla fyra algoritmerna tranas pa olika konfigurationer av databaserade pa koncept inspirerade av teknisk analys. Konfigurationernainnehaller slutpriser, volatilitet, och handelsvolym i olika kombinationer.Vardena ar tagna fran redan passerade handelsdagar, dar antalet passer-ade dagar fran vilka data ar hamtad varieras fran tva till 30 dagar. Deresulterade modellerna, baserade pa de olika konfigurationerna och antalhandelsdagar, uppnadde en precision pa 50 − 54%. Rapporten fastslaratt den utvunna effekten av de olika konfigurationerna varierar pa vilkenalgoritm som anvands, samt pa hur manga tidigare handelsdagar somanvands. Tillsist sa dras slutsatsen att volatiliteten ar vard att ha i atankevid skapandet av maskininlarningsmodeller for aktiemarknadsdata.

2

Contents

1 Introduction 41.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 52.1 Technical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . 62.3 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Decision trees . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.3 Random forest . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Extreme learning machines . . . . . . . . . . . . . . . . . . . . . 82.5 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.6 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Method 103.1 Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Volatility measure . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Days looking backwards . . . . . . . . . . . . . . . . . . . . . . . 103.5 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.6 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.7 Algorithmic specifications . . . . . . . . . . . . . . . . . . . . . . 11

4 Experimental results 124.1 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2 ELM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.4 Naive-Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.5 All algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.6 All configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Discussion 225.1 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 ELM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.3 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.4 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.5 All algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.6 All configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Conclusions 25

7 Sources 26

3

1 Introduction

Forecasting the behavior of financial products is an important preoccupationfor many professionals and laymen alike, as well as academics. Investmentbanks, hedge funds, and insurance companies spend significant time and effortin predicting changes in the properties, such as prices and risks, of stocks, bonds,options, etc. This interest in financial markets has for the last few decadesdriven the development of better tools for analyzing financial information. Themost fruitful of which, historically speaking, has perhaps been the research infinancial mathematics.

Technical analysis is the study of past information to predict future values.Technical analysis is commonly used on financial markets and can be used onfor example stock prices. Mathematical concepts are used, for example theaverage moving index, but the mathematical sophistication of these tend to begenerally low. Technical analysis also takes the form of grouping continuousdata into discrete packets of data to visually summarize the data, sometimesreferred to as candlesticks.

Candlestick charts are frequently used in technical analysis to represent timeseries. Time series are divided into specific units, most commonly days, andtransformed into “candlesticks”. Each candlestick holds information about thestarting and final value of that time period as well as the highest and lowestlevels reached in between. A candlestick can thereby be summarized as startand end values combined with the volatility of the underlying value.

Financial mathematics is the use of advanced mathematical and statisticalmethods to price and minimize risk of financial products and portfolios. Anexample of this is the Black-Scholes method for options pricing (Black, et al1973) and the LIBOR market model for pricing interest rates derivatives (Bruce,et al. 1997). With the increasing access to computational power the use ofartificial intelligence, particularly machine learning has become an importantaspect of market prediction.

Machine learning has been successfully used for time-series financial fore-casting. Support vector machines (SVM) and artifical neural networks (ANN)have been widely used within academics to predict financial data. Cao andTays work on SVMs and ANNs cemented SVMs as an important tool for stockprediction. The SVM has since been expanded upon to form a multiple-kernelbased learning system for stock prediction (Fletcher, 2012) (Yeh, et al., 2010)and credit risk evaluation (Yu, et a.l, 2009), and the extreme learning machines(ELM), a type of ANN, has been developed and used for financial prediction(Ding, et al 2015). Simpler approaches such as decision trees have also beensuccessfully used (Audrino, et al 2010), whereas very little research has beendone on the Naive-Bayes classifier for stock data.

There has been little research on the performance of various combinations ofmachine learning and technical analysis methods. The research that has beenmade on the subject is generally focused on one, or a few, learning algorithmsand little weight is put on the the importance of the choice of parameters. Assuch there is a gap of knowledge as to which parameters work well with which

4

algorithms. It is our ambition to contribute to filling that gap by analyzingthe performance of a few selected algorithms from different machine learningparadigms with alternating input parameters.

1.1 Problem statement

What is the performance of the SVM, ELM, Random-Forest, and Naive-bayesalgorithms on financial data with respect to changes in price, volatility, tradingvolume, and different time intervals, as well as combinations of these?

2 Background

2.1 Technical analysis

Technical analysis is a method for anticipating market price changes by studyinghistoric data for that market. It primarily focuses on past price levels, buttrading volumes are used as well. The fundamental assumption of technicalanalysis is that market prices reflect all available information (Snopek, 2012)separating it from other security analysis methodologies such as fundamentalanalysis, where the state of the underlying asset is ascertained by reviewing itsfinancial statements. This reflection is not seen as perfect because the relevantinformation might not always be wholly available due to, for example, insidertrading (Snopek, 2012).According to technical analysis, prices move in distinct trends. A trend isbullish if the highest and lowest price levels for a security are gradually higherand bearish if the opposite is true. A flat trend occurs when price levels staythe same during a longer time interval.A third concept of technical analysis is that history repeats itself and ”thefuture is just a repetition of the past” (Snopek, 2012) and so past errors andevents, such as market crashes or speculative bubbles, might be repeated bynew generations.There is no consensus whether technical analysis can be effectively used as aninvestment strategy. Some studies have been made that show technical analysisto be no better than simply buying a security and holding it until an opportunetime, without any portfolio strategy.

2.1.1 Volatility

Volatility has been widely used in financial market prediction. Many modelswithin mathematical finance, including the Black-Scholes model, use volatility asone of the main components for handling pricing. Machine learning algorithmshave been used to predict future volatility, including SVMs (Wang, et al. 2011),ELMs (Wang, et al. 2014), and regression trees (Audrino, et al. 2009). Howeverlittle research on the predictive capabilities of using volatility as an indicatorfor future market behavior has been done. In this paper we will examine howusing volatility as a predictor, on several different machine learning algorithms,

5

applied on stock price data influences the accuracy in prediction of future pricedevelopment.

2.1.2 Volume

The volume of trades, i.e. the number of trades done during the time intervalof some underlying asset, is one of the main aspects of technical analysis. Thegeneral idea is that changes in price of the asset together with a small volume willhave a different impact on the market than the same change in price associatedwith a larger trading volume.

2.2 Support Vector Machines

The SVM algorithm is effectively trying to place a high-dimensional plane, orhyperplane, between the differently classified sets of data points and predictingnew data points based on which side of the hyperplane they are located. Data ishowever often not dividable by a plane, which the SVM handles by using slackvariables and kernels.The aim of a basic separating hyperplane algorithm is to separate the differentclasses of data using a hyperplane, while achieving the maximum margin be-tween the closest data points of the different classes and the separating plane.SVM is an extension of the separating hyperplane idea by representing the datain higher dimensions to spread the distribution of the low dimensional data.This can be summarized as the following optimization problem;

min1

2~wT ~w (1)

subject to ti ~wT Φ(~xi) ≥ 1,

where ~w is the weight vector of the hyperplane, ~xi is the ith training point, tiis the class of the ith training point, and Φ(~x) is the non-linear transformationused. This formulation is however often too rigid, and unable to find an optimalsolution if there are data points from different classes intertwined. To deal withthis problem slack variables are introduced and the problem is rewritten as;

min1

2~wT ~w + C

∑i

ζi (2)

subject to ti ~wT Φ(~xi) ≥ 1− ζi,

where ζi are the slack variables and C is a slack constant (James, G. et al, 2014).Also this formulation can be restated, in the form of it’s dual problem;

max∑i

αi −1

2

∑i,j

αiαjtitjκ(~xi, ~xj) (3)

subject to 0 ≤ αi ≤ C,

6

where κ(~xi, ~xj) = Φ(~xi)T Φ( ~xj) is the kernel. Once the optimization problem

has been solved, i.e. the model has been trained, unseen data points, ~x, areclassified with

∑i

αitiκ(~x, ~xi) > 0.

2.3 Random Forest

Random forest is a classification algorithm belonging to the tree-based machinelearning algorithms. Tree-based methods attempt to divide the space of possibleoutcomes into generally simple areas. This is done by splitting the outcomespace in two, one feature at a time. Tree-based methods are believed to reflectthe way humans make decisions. They also allow for easy inference even in itssimplest forms, but usually need to be improved in order to reach predictionaccuracy rates comparable to other algorithms (James, G. et al, 2014). Suchimprovement can be attained by for instance, implementing Random Forest.

2.3.1 Decision trees

The decision tree classifier constructs a logical sequent, in the form of a treedata structure. The idea is to identify the attribute, at the current node in thetree, on which one can gain the most information from splitting on. This isdone by comparing the entropy of the data set before and after the split, i.e.the information gain of the split;

I(S,A) = Ent(S)−∑

v∈V alues(A)

|Sv||S|

Ent(Sv), (4)

where S is the training set, A is a attribute of the training set, and Sv is the setafter the split containing the value v on attribute A. Ent(S) is the entropy ofthe set S. Once the tree has been trained the sequent formed is used to classifyunseen data points.

2.3.2 Bagging

Decision trees can attain very different results even if they are trained on thesame dataset making them unpredictable (James, G. et al, 2014). Bootstrapaggregation, bagging in short, is an attempt to make decision trees more pre-dictable when trained on the same dataset in order to achieve consistent pre-diction accuracy rates.

Bagging uses bootstrapped replicates of the training set to train several in-stances of and combine a simple classifier, fb to form a more powerful predictor,fbag. This is done by creating B bootstrapped training sets, Sb, by samplingwith replacement from the given training set S = {(~x1, y1), ..., (~xm, ym)}, wherepossibly |Sb| > m. The classifier fb is then trained using Sb, and unseen datapoints are classified using;

fbag(~x) = maxk

∑b

Ind(fb(~x) = k), (5)

7

where k is the class, and Ind(a = b) = 1 if a = b and Ind(a = b) = 0 if a 6= b.

2.3.3 Random forest

The random forest method is similar to bagged trees, it uses bootstrapping toconstruct several training sets and subsequently trains decision trees on theseto create a classifier with greater accuracy than the individual decision trees. Itdiffers from bagging in the way the predictors are used in splitting. Whenever asplit in a tree is considered, only a random subset n of the available predictorsare provided as possible candidates for splitting (James, G. et al, 2014). Thisis done to prevent the algorithm from creating very similar trees due to oneparticularly strong predictor always being chosen in the splitting process.

2.4 Extreme learning machines

The extreme learning machine is a type of artificial neural network, namelya single layer feed-forward network (SLFN). This means that the algorithmtakes input and pushes it forward to hidden nodes, which in turn output aclassification for new data points. The standard mathematical representationof an SLFN is;

N∑i=1

~βifi(~xj) =

N∑i=1

~βifi(~ai~xj + bi) = tj , j = 1...n, (6)

where N is the number of nodes in the hidden layer, ~ai is the is the input-weightvector for the ith node, ~βi is the output-weight vector for the ith node, bi isthe threshold of the ith node, and fi(~x) is the activation function (or kernel inSVM terminology) (Liu, X, et al. 2013). This can in turn be written compactlyas Hβ = T , where H is an n×N matrix and Hi,j = f(~aj~xi + bj), β is a vectorwith βT

i in position i, and T is a vector with the sample outputs. To train theclassifier one initiates all ~ai and bi to random values, and compute the output-weight vector β from β = H+T , where H+ is the Moore-Penrose pseudo-inverseof H (Li, X. et al 2016).

2.5 Naive Bayes

The naive bayes classifier is a probability based classifier based on Bayes’ The-orem. The main idea is to, instead of fitting k-dimensional data in to oneprobability distribution, to fit the data into k separate, one-dimensional distri-butions. In order to use the algorithm, a critical assumption has to be made,namely that the features are regarded as independent. This assumption is whatmakes this algorithm ’naive’. Using the maximum a priori method (MAP), thenaive bayes can be implemented as follows:

YMAP = arg maxy∈Y P (y|x1, ..., xk) = argmaxy∈YP (x1,...,xk|y)P (y)

P (x1,...,xk)=

8

= argmaxy∈Y

P (x1, ..., xk|y)P (y), (7)

where ~x = (x1, ..., xk) is a k-dimensional data point, and Y = {1, 2, ..., yn}are the possible classes. Since the features are regarded as independent we areassuming that P (x1, ..., xk|y) =

∏Kk=1 P (xk|y). As such the classifier becomes

YMAP = argmaxy∈Y P (y)∏K

k=1 P (xk|y) (James, G. et al, 2014).

2.6 Cross-validation

Cross-validation is used to ensure that the prediction accuracy of a learningalgorithm is properly estimated. The data set is split into p partitions and thealgorithm is trained on p − 1 of the subsets and then the predictive accuracyof the algorithm is tested on the remaining subset. This process is repeated ptimes and called p-fold cross-validation, using every subset as the testing setonce. The performance of the different tests are then averaged.

9

3 Method

The aim of this paper is to evaluate the influence of using a volatility and volumemeasure as an indicator of future stock price developments. This will be doneby running the different algorithms on sample data sets using, firstly, only pastend-of-day stock prices for different number of past days included. Then it willbe also be run on combined price with volatility measures, combined price withvolume and lastly, combined price, volatility and volume measures.

3.1 Price

The price used will be normalized, i.e. every end-of-day stock price will in factbe a percentage of the start-of-day price.

3.2 Volatility measure

Due to the lack of precision data the volatility will have to be approximated. Inthis paper we will approximate the volatility by letting the highest and lowestprices of the trading day be parameters of the data used to train the machinelearning algorithms. As with the price we also here use the percentage of thestart-of-day price.

3.3 Volume

The volume parameter is also normalized. This is done by taking the averagetrading volume over the entire time interval and dividing each individual volumedata point by this number.

3.4 Days looking backwards

The algorithms are run on 5 data input using five different time aspects. Theseare looking backwards on the 2, 3, 5, 10, and 30 previous trading days. Thevarious combinations mentioned above are implemented on each of the previousdays. This gives, for example, nine features per data point for the algorithmsrunning price and volatility (highest and lowest points) looking backwards 3days.

3.5 Data set

The algorithms will be run on the S&P500 index. The data set contains dailytrading information for the stocks from January 3 2000 to 27 March 2016, includ-ing opening and closing prices, highest and lowest prices, and dividend adjustedprice change.

10

3.6 Accuracy

To evaluate the performance of the algorithms we look at the prediction accu-racy. Two different approaches to determining the significance of the accuracyare used. The ELM algorithm is run ten times on the different data sets andan average of the prediction accuracy is taken to ensure a representative value.On the other three algorithms 10-fold cross-validation is used.

3.7 Algorithmic specifications

For both ELM and SVM the radial basis kernel, Φ(~x) = e−(~x−~k)2

r2 , is used. ForELM 100 hidden nodes are used. For Random Forest 100 trees are trained withunlimited depths. For Naive-Bayes and Random Forest the machine learningtoolkit WEKA is used. For the SVM implementation LibSVM is used.

11

4 Experimental results

4.1 SVM

The following tables show the accuracy of the SVM using the different combi-nations of input features. The accuracy is given in percentages.

Days looking backward Accuracy2 50.96793 51.05395 51.446810 51.313530 51.6408

Table 1: SVM prediction accuracy on price features


Table 2: SVM prediction accuracy on price and volatility features


Table 3: SVM prediction accuracy on price and volume features

12


Table 4: SVM prediction accuracy on price, volatility and volume features

As seen, all configurations aside from price only lead to similar predictionaccuracy rates. Accuracy rates range approximately between 51% and 51.7%.The highest score was attained by the 30-day price only configuration, followedby the 3-day price and volatility set. The price only configuration is the onlyone that appears to have more accurate results with the increase of used pastdays.

13

4.2 ELM

The following tables show the accuracy of the ELM using the different combi-nations of input features. The accuracy is given in percentages.


Table 5: ELM prediction accuracy on price features


Table 6: ELM prediction accuracy on price and volatility features


Table 7: ELM prediction accuracy on price and volume features

14


Table 8: ELM prediction accuracy on price, volatility and volume features

Here, no one configuration of features consistently outperforms any of theothers. Accuracy rates range approximately between 51% and 54%. The highestrate is achieved by the 2-day price and volatility set but has lower rates thanboth the price only and the complete configuration when the number of usedpast days increases.

15

4.3 Random Forest

The following tables show the accuracy of the Random-Forest algorithm usingthe different combinations of input features. The accuracy is given in percent-ages.


Table 9: Random-Forest prediction accuracy on price features


Table 10: Random-Forest prediction accuracy on price and volatility features


Table 11: Random-Forest prediction accuracy on price and volume features

16


Table 12: Random-Forest prediction accuracy on price, volatility and volumefeatures

Accuracy rates range approximately between 50% and 53%. All configura-tions except price and volume follow a similar pattern. The 3-day price andvolume achieves the best accuracy, followed by the 30-day price volatility andvolume configuration.

17

4.4 Naive-Bayes

The following tables show the accuracy of the Naive-Bayes algorithm using thedifferent combinations of input features. The accuracy is given in percentages.


Table 13: Naive-Bayes prediction accuracy on price features


Table 14: Naive-Bayes prediction accuracy on price and volatility features


Table 15: Naive-Bayes prediction accuracy on price and volume features

18


Table 16: Naive-Bayes prediction accuracy on price, volatility and volume fea-tures

Accuracy rates range approximately between 51.5% and 53%. The 10-dayprice only configuration achieves the best accuracy, followed by a tie betweenthe 2-day price only configuration and the 2-day complete configuration.

19

4.5 All algorithms

Accuracy ranges approximately from 51.5% to 52.5%, making the gap narrow,compared to the individual results.

20

4.6 All configurations

Accuracy ranges from 51.2%− 53% at 2 days and the gap continually decreasesas more days are added. At 30 days the accuracy gap has closed in to a rangefrom 51.5%− 52.5%.

21

5 Discussion

5.1 SVM

There are trends in the data. All features combinations perform better for 3days looking backwards than 2 days. The price only measure has a generalupwards curve of prediction accuracy with more days looking backwards. Thisindicates that 2 days is suboptimal for prediction. The price only measure isoutperformed by all other measures on 2 and 3 days, but in turn outperforms allother measures for 5 days and more, with the exception of price and volatilityfor 10 days. This indicates that the additional information is useful on a smallertimescale, but impairs prediction accuracy on longer ones by creating unneces-sary noise. The exception, as mentioned, being price and volatility on 10 days.This, together with the fact that price and volatility constantly outperforms thevolume included measures, suggests that the volatility measure is more reliablethan the volume measure. However, since the top performing measure is theprice only for 30 days implies that volatility only is reliable for shorter timeframes.

It is worth noting that the accuracy rates fit in narrow range between 51% −51.7%. This makes it uncertain whether the different configurations have a sig-nificant influence on the learning algorithm. Furthermore, the SVM achievedthe lowest accuracy rates out of all tested methods.

The given results from the SVM can be seen as reliable, due to the use ofcross-validation in all configurations.

5.2 ELM

The performance of the price and volatility and price and volume measureshave similar developments over difference in days, with the price and volatilitymeasure constantly performing better or on par with price and volume. Thisindicates that volatility is a better feature for indicating stock behavior. How-ever, the price only measure in turn outperforms both of these for 3 and moredays. The price, volatility and volume measure performs at or around the bestmeasure for all days except 30. This decline in accuracy could be caused bythe large number of input parameters looking back 30 days. Should this be thecase, it would seem that the ELM algorithm is able to use both volatility andvolume together to create a measure better than the individual measures.

The accuracy has a wider range than that of the SVM discussed above, witharound 3 percentage points between the best and worst performer. The ELMalso receives some of the higher accuracy rates out of all algorithms.

Due to the implementation limitations of the used ELM algorithm, cross-validationwas not used. Instead, reliability was calculated by using a part of the dataset

22

as a testing set. Although, with enough iterations, the results can be con-sidered reliable, it would have been better to assure training consistency withcross-validation.

5.3 Random Forest

All measures perform comparably for 2 and 30 days. The price only measureis outperformed on all timescales in between, which implies that the additionof information is beneficial for the predictive accuracy of the random forest al-gorithm. The addition of volume gives rise to significant rise in accuracy for 3days and also outperforms all other measures for 5 days. The volatility, how-ever, measure performs better for 10 and 30 days. The combined volatility andvolume measure performs better or on par with the volatility measure for alltimeframes. This signals that the volume measure should be used for all num-ber of days, with the addition of the volatility measure when using more than5 days of looking backwards, to create the optimal random forest stock predictor.

These differences in configuration can be seen as potentially significant dueto the reasonably wide range of accuracy rates of about 2.5 percentage points,close to the range of the ELM.

Training results have been made consistent by applying 10-fold cross-validationand can thus be seen as reliable.

5.4 Naive Bayes

The price only measure performs better, or marginally worse, than all othercombinations of measure for all days, with a sharp decline in accuracy from10 to 30 days. This suggests that the addition of features impairs the perfor-mance of the naive bayes classifier, for stock data. The volume configuration,despite receiving low accuracy rates for 5 and 10 days, seems to improve for the30-day period. This could explain the strong result for the 30-day combinedprice, volatility and volume configuration, possibly indicating that volume isuseful when observing long time periods. There is significant difference in theperformance of the volatility and volume measures, compared to the price onlymeasure. However as the accuracy of the price only configuration drops at the30-day mark, the combined price, volatility and volume measure as well as justprice and volume increases. This could indicate that more data features thanjust price may increase accuracy for longer timescales but are less beneficialwhen only looking back a few number of days.

The accuracy range is almost exactly the same for the Naive Bayes Classifier asit is for the Random Forest algorithm, but reaches slightly higher rates for theprice only configuration.

23

Same as previously, the reliability of the Naive Bayes Classifier is safeguardedby 10-fold cross-validation.

5.5 All algorithms

The price only configuration seems to be have improved accuracy rates withthe increase of used past days. Price and volatility together with the completeprice, volatility and volume configurations attain the highest accuracy rates atthe 2-day mark and then gradually worsen. The price and volatility as well asprice and volume experience a slight recovery at 30 days but no similar effect inthe complete configuration is seen. Price and volume appears to have the lowestconsistent accuracy, being surpassed for all days by the complete configuration,except for 30 days where the two sets reach very similar results.

5.6 All configurations

At the 2 days mark there are two clear groups, the ELM and Naive-Bayessignificantly outperforming the Random Forest and SVM. As more days areadded, the performance of Naive-Bayes and ELM decrease and the RandomForest performance increases, leaving all three on a comparable level. TheSVM, however, only make negligible changes in performance. This indicatesthat Random Forest is better suited for longer timescales, whereas ELM andNaive-Bayes are better suited for shorter ones, and SVM performance is notsignificantly affected by addition of past data.

24

6 Conclusions

It is difficult to draw any general conclusions that apply to all algorithms, insteadwe can conclude that the optimal combinations of features and days of thealgorithms vary. For the SVM, using additional measures as opposed to onlyprice data, is beneficial for short time frames. When using additional features,volatility should be used. However, as more days are used this addition of datawill impair predictive performance. As with the SVM, the volatility measureoutperforms volume for the ELM. Combing the two does seem to give a boostin performance for days 3 to 10. Unlike the SVM, for the ELM there areno trends in optimal performance over time. However, using the price andvolatility measure for 2 days outperforms all other combinations and algorithms.For Random Forest the addition of feature parameters increases performancesignificantly. The volume measure performs well for shorter time frames, and thevolatility for longer ones. The volatility and volume measure is carried for days 3and 5 by the volume measure and for days 10 and 30 by the volatility measure, tocreate a stable predictor over all time frames. As opposed to the Random Forestalgorithm, the Naive-Bayes performs worse in general when additional featuresare added to the price only measure. In tweaking prediction performance thenumber of used past days is fundamental for all algorithms except SVM. Theusage of different features to boost performance is also highly dependent onthe number of days. In general the volatility measure performs better than thevolume measure, and they are both outperformed by the combined volatilityvolume measure. However, in some cases the price only measure outperformsall three other by a significant margin. Based on this we draw the conclusion thatwhat feature, number of used past days, that performs best is highly dependenton the algorithm. However, it would be beneficial to always at least considerusing the volatility measure when designing machine learning algorithms.

25

7 Sources

Liu, X., Wang, l., Huang, G., Zhang, J., Yin, J. (2013). Multiple kernel extremelearning machine. 149 Neurocomputing, 253-264.

Wang, F., Zhao, Z., Li, X. Yu, F., Zhang, H. (2014), Stock Volatility Pre-diction using Multi-Kernel Learning based Extreme Learning Machine. Beijing:2014 International Joint Conference on Neural Networks.

Yeh, C., Huang, C., Lee, S. A multuple-kernel support vector regression ap-proach for stock market price forecasting. (2011), 38 Expert Systems withApplications, 2177-2186.

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An introductionto statistical learning. New York, NY: Springer, p. 175-197, 303-330, 337-366.

Wang, B., Huang, H., Wang, X. A support vector machine based MSN modelfor financial short-term volatility prediction. (2013), 22 Nueral Computing &Applications, 21-28.

Ding, S., Zhao, H., Zhang, Y., Xu, X., Nie, R. Extreme learninh machine:algorithms, theory and applications. (2013), 44 Artificial Intelligence Review,103-115.

Audrino, F., Colangelo, D. Semi-parametric forecasts of the implied volatil-ity surface using regression trees. (2009), 20 Statistical Computing, 421-434.

Black, F., Shcoles, M. The Pricing of Options and Corporate Liabilities. (1973),81 Journal of Political Economy, 637-654.

Bruce, A. The Market Model of Interest Rate Dynamics. (1997), 7 Mathe-matical Finance, 127-147.

Li, X., Xie, H., Wang, R., Cai, Y., Coa, J., Wang, F., Min, H., Deng, X. Em-pirical analysis: stock market prediction via extreme learning machine. (2013),27 Neural Computing & Applications, 67-78.

Cao, L., Tay, F. Financial Forecasting using Support Vector Machines. (2001),10 Neural Computing & Applications 184-192.

Fletcher, T., Shawe-Taylor, J. Multiple Kernel Learning with Fisher Kernelsfor High-Frequency Currency Prediction. (2013), 42 Computational Economics,217-240.

Yu, L., Yue, W., Wang, S., Lai, K. Support vector machine based multiagentensemble learning for credit risk evaluation. (2010), 37 Expert Systems & Ap-

26

plications 1351-1360.

Snopek, L. The Complete Guide to Portfolio Construction and Management.(2012).

27

www.kth.se

technical analysis inspired machine learning for stock ...928205/fulltext01.pdf · examensarbete...

Documents