using support vector machine with a hybrid feature selection method to the stock trend prediction

25
Using support vector machine with a hybrid feature selection method to the stock trend prediction Ming-Chi Lee Expert Systems with Applications . 2009 Presenter: Yu Hsiang Huang Date: 2012-05-17 1

Upload: lolokikipipi

Post on 14-Jun-2015

4.693 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Using support vector machine with a hybrid feature selection method to the stock trend prediction

1

Using support vector machine with a hybrid feature selection method

to the stock trend prediction

Ming-Chi LeeExpert Systems with Applications . 2009

Presenter: Yu Hsiang HuangDate: 2012-05-17

Page 2: Using support vector machine with a hybrid feature selection method to the stock trend prediction

2

Outline• Introduction• Feature selection• Research design• Experimental results and analysis• Conclusion

Page 3: Using support vector machine with a hybrid feature selection method to the stock trend prediction

3

Introduction• Stock market

– Highly nonlinear dynamic system

• Application of AI– Expert system , Fuzzy system, Neuron network– Back propagation neural network (BPNN)

• Power of prediction is better than the others• Require a large amount of training data to estimate the distribution of input pattern• Over-fitting nature• Fully depends on researcher’s experience of knowledge to preprocess data

– relevant input variables, hidden layer size, learning rate, momentum, etc.

Page 4: Using support vector machine with a hybrid feature selection method to the stock trend prediction

4

Introduction• In this paper

– Support vector machine (SVM)• Captures geometric characteristics of feature space without deriving weights of

networks from the training data. • Extracts the optimal solution with the small training set size• Local optimal solution vs. Global optimum solution• No over-fitting • Classification performance is influenced by dimension or number of feature variables

– Feature selection• Addresses the dimensionality reduction problem by determining a subset of available

features which is most essential for classification• Hybrid feature selection : Filter method + wrapper method F_SSFS• F_SSFS : F-score + Supported sequential forward search• Optimal parameter search

– Compare performance between BP and SVM

Page 5: Using support vector machine with a hybrid feature selection method to the stock trend prediction

5

SVM-based model with F_SSFSOriginal feature variables

Filter partFeature pruning using F-score

Wrapper partSSFS algorithm find best feature variables

Pre-selected features

SVMTraining , testing , evaluating the classification accuracy

Best Feature variables

Data

Hybrid feature selection

Page 6: Using support vector machine with a hybrid feature selection method to the stock trend prediction

6

Feature selection• Filter method :

– No feed back from classifier– Estimate the classification performance by some indirect assessments

• Distance : reflect how well the classes separate from each other

Estimate the classification

performance : distance

No feedback from classifier

Page 7: Using support vector machine with a hybrid feature selection method to the stock trend prediction

7

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– F-score• Play the role of filter• Pre-selected features – “informative”• Given training vector , k=1,2,..,m• the number of positive and negative instances• F-score of feature :

• are the averages of feature of the whole , positive, negative data sets

• The numerator indicates the discrimination between the positive and negative sets• The denominator indicates the one within each of two sets• The larger the F-score is, the more likely this feature is more discriminative

𝑥𝑖¿¿

𝑥𝑖(−)𝑥𝑖

Page 8: Using support vector machine with a hybrid feature selection method to the stock trend prediction

8

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– F-score

Calculate F-score

Original feature variables

Sort F-score

Select top K F-score feature

K pre-selected feature

Page 9: Using support vector machine with a hybrid feature selection method to the stock trend prediction

9

SVM-based model with F_SSFSOriginal feature variables

Filter partFeature pruning using F-score

Wrapper partSSFS algorithm find best feature variables

Pre-selected features

SVMTraining , testing , evaluating the classification accuracy

Best Feature variables

Data

Hybrid feature selection

Page 10: Using support vector machine with a hybrid feature selection method to the stock trend prediction

10

Feature selection• Wrapper method:

– Classifier-dependent• Evaluate the “goodness” of the selected feature subset directly (from classifier)• Should intuitively yield better performance

– Have limit applications• Due to the high computational complexity involved

Feedback from classifier

Page 11: Using support vector machine with a hybrid feature selection method to the stock trend prediction

11

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– Supported sequential forward search (SSFS)• Play the role of wrapper• A variation of the sequential forward search (SFS) algorithm that is specially tailored to SVM to

expedite the feature searching process• Support vector : training samples other than support vectors have no contribution to

determine the decision boundary• Dynamically maintains an active subset as the candidates of the support vector• Training SVM using reduced subset rather than the entire training set - less computational cost

Page 12: Using support vector machine with a hybrid feature selection method to the stock trend prediction

12

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– Supported sequential forward search (SSFS)

f1 f2 f3 f4 … fk-2 fk-1 fk label

r1 … … … … … … … … +

r2 … … … … … … … … -

… … … … … … … … … -

rN … … … … … … … … +

Page 13: Using support vector machine with a hybrid feature selection method to the stock trend prediction

13

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– Supported sequential forward search (SSFS)

Iteration = 1

Iteration = n+1

1. No significant reduction of is found2. Desired number of features has been obtained

Termination

Page 14: Using support vector machine with a hybrid feature selection method to the stock trend prediction

14

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– F_SSFS• Uses the F-score measure to decide the best feature subsets• Uses the SSFS algorithm to select the final best feature subsets• Reduces the number of features that has to be tested through the training of SVM• Reduces the unnecessary computation time spent on the testing of the “no-informative”

features by wrapper method

Page 15: Using support vector machine with a hybrid feature selection method to the stock trend prediction

15

Research design• Data collection and preprocessing

– Prediction target : the direction of change in the daily NASDAQ index– Index futures lead the spot index – Using 30 technical indices as the whole features set– 20 future contracts, 9 spot indexes and 1-day lagged NASDAQ Index– Use “1” and “-1” to denote the next day’s index is higher or lower than today’s– From Nov 8, 2001 to Nov 8, 2007 with 1065 observations per feature – The original data are scaled into the range of (0,1)

f1 f2 f3 … … f28 f29 f30 label

1 … … … … … … … … 1

2 … … … … … … … … -1

… … … … … … … … … -1

1065 … … … … … … … … 1

Page 16: Using support vector machine with a hybrid feature selection method to the stock trend prediction

16

Research design• SVM-based model with F_SSFS

– Filter part• Calculating F-score for every feature and ranking features without involving the classifier• Sorting F-score and select K (threshold) highest scored features to construct the feature subset

– Wrapper part• Each selected feature does the 5-fold cross-validation and calculates the average accuracy of the 5-fold cross-validation• Determining the feature to be added in the best feature subset using M is the objective function• Repeat… • Until no significant increasing accuracy of cross-validation is found or the desired number of features has been obtained

Page 17: Using support vector machine with a hybrid feature selection method to the stock trend prediction

17

Research design• Modeling for support vector machine

– Model selection and parameter search• Radial basis function (RBF) kernel• Kernel parameter and penalty parameter • Grid-search on () using 5-fold cross-validation

– Preventing the over-fitting problem– Computational time to find good parameters is less that other methods– Grid-search can be easily parallelized because () is independent– Try exponentially growing sequences of ()

» = » =

• Final performance of classifier is evaluated by mean costs of v folds subsets• Use LIBSVM software to conduct SVM experiment

Page 18: Using support vector machine with a hybrid feature selection method to the stock trend prediction

18

SVM-based model with F_SSFSOriginal feature variables

Filter partFeature pruning using F-score

Wrapper partSSFS algorithm find best feature variables

Pre-selected K features

SVMTraining , testing , evaluating the classification accuracy

Best Feature variables

Data

Hybrid feature selection

Page 19: Using support vector machine with a hybrid feature selection method to the stock trend prediction

19

Experimental results and analysis• Experimental result of F_SSFS

– Threshold K determines how many features we want to keep after filtering. • K is equal to the number of all original features filter part does not contribute at all• K is equal to 1 the wrapper method is unnecessary

Page 20: Using support vector machine with a hybrid feature selection method to the stock trend prediction

20

Experimental results and analysis• Experimental result of F_SSFS – filter part

– Set

– K ↑, accuracy of prediction ↑, selection process time ↑– The performance and complexity of the algorithm can be balanced by tuning K– Choose K = 22, after the process of wrapper part, 17 features variables turned out to have

Page 21: Using support vector machine with a hybrid feature selection method to the stock trend prediction

21

Experimental results and analysis• Experimental result of F_SSFS – wrapper part

– Choose K = 22, after the process of wrapper part– 17 features are left, average accuracy rate 81.7%

Page 22: Using support vector machine with a hybrid feature selection method to the stock trend prediction

22

Experimental results and analysis• Result of SVM model selection

– RBF kernel • Penalty parameter , Kernel parameter • Grid-search using 5-fold cross-validation

– Optimal () is () with cross-validation rate of 87.1%

Page 23: Using support vector machine with a hybrid feature selection method to the stock trend prediction

23

Experimental results and analysis• Experimental result of SVM

• Experimental result of BPNN

Page 24: Using support vector machine with a hybrid feature selection method to the stock trend prediction

24

Experimental results and analysis• Experimental result of feature selection

– Key deficiency of neural-network models for stock trend prediction • Difficulty in selecting the discriminative features and explaining the rationale for the stock trend prediction

– Relative importance of each feature

Page 25: Using support vector machine with a hybrid feature selection method to the stock trend prediction

25

Experimental results and analysis• Conclusion

– Stock trend prediction– Support vector machine with hybrid feature selection method (F_SSFS)– Reducing high computational cost and the risk of over-fitting– Need to investigate to develop the optimal value of the parameters in SVM for

the best prediction performance– Generalization of SVM on the basis of the appropriate level of the training set

size and give a guideline to measure the generalization performance