stock price prediction from natural language understanding of news headlines machine learning...
Post on 20-Dec-2015
215 views
TRANSCRIPT
Stock Price Prediction from Natural Language Understanding of News Headlines
Machine learning experiment:
• Task is to predict whether a stock will rise or fall significantly in reaction to the appearance of some news headline about the stock.
• The agent is trained on a year's worth of dated headlines and closing prices for 100 stocks.
• Performance is measured on the number of accurately classified test headlines. Does the headline predict a RISE, FALL or NOTHING?
Some related prior work
• Thomas Fawcett and Foster John Provost (1999) - Activity Monitoring, monitor streams of data for something interesting.
• Lavrenko et al (2000) – Ænalyst system, combines two time series data streams, headlines and quotes.
• Sofus Macskassy (2003) - Information Filtering, Prospective training data
• Yu, H. Hatzivassiloglou, V (2003) - Towards Answering Opinion Questions, separate facts and opinions, understand polarity of opinions, using Bayesian Classifier
Naïve Bayes Classifier
P(D|h) P(h)
P(D)P(h|D) =
h H, H is the set {RISE, FALL, NOTHING}
priors:
P(RISE) = 0.013055
P(FALL) = 0.007976
P(NOTHING) = 0.978969
-The occurrences of rises and falls are sparse.
- Each word in the dictionary collected has counts of when they appeared and when they occurred during a RISE, FALL
- P(D|h) estimated by multiplying the probabilities of the occurrence of each word in a headline during the RISE, FALL and NOTHING
Dictionary generated
• Human-understandable model
• Can be used for further agent design
TOTAL OCCURANCES DURING RISE DURING FALL WORD
98 11 0 upgraded 59 0 5 downgraded 97 21 1 bank 58 6 1 big 15 3 0 boosts 133 11 3 deal 9 0 2 disappoint 38 2 8 drop
31 3 1 despite
189 8 4 disclosure
96 3 1 growth
53 7 1 gains
Results
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
0 2000 4000 6000 8000 10000
% Accuracy vs. # of Training headlines
Further work
• Bug fixes
• Something more than “bag of words”
• Per-symbol language models
• Simple word-based decision-stubs as inputs to the Boosting algorithm