stock prediction using social network
Post on 13-Apr-2017
Embed Size (px)
Stock Prediction Using Social Network
Stock Prediction Using Social Network Data
Rohit Tiwari (rtiwari2) Chanon Hongsirikulkit (hongsir2)
OutlineIntroductionData SourcesAPIsFilter Relevant DataText NormalizationNoise RemovalFeature ExtractionTopic ModelingSentiment AnalysisTweet FeaturesPrediction Model ConstructionConclusionFuture Works
Fake Tweet -> Stocks Plunged
IntroductionSocial Network is a communication platform contain hidden valuable knowledgeInformation on social network can reflect the real-world eventsMany researches exploit those information to enhance the application capabilityTo analyze tweets contain information needs (Zhao and Mei 2013)Apply tweet-rate to predict box office revenues of movie (Asur and Huberman 2010)Our survey will focus on using social network data to predict stock market movementFalse message on Twitter BREAKING: Two Explosions in the White House and Barack Obama is injured. -> The Dow Jones and S&P 500 indexes dropped by close to 1%, the equivalent of hundreds of billions of dollars changing hands.In August 2012, an Italian journalist set up a fake Twitter account for a member of Russia's government and tweeted that the president of Syria had been killed, causing brief fluctuations in the oil markets.http://www.telegraph.co.uk/finance/markets/10013768/Bogus-AP-tweet-about-explosion-at-the-White-House-wipes-billions-off-US-markets.html
Formal Description: The Efficient Market Hypothesis (EMH)
The EMH states that financial markets are the source of comprehensive and huge information. It implies that market prices reflect changes in investor behavior since they take this into account and act accordingly.Research asserts investors rational considerations are influenced by psychological biases and emotions. For several decades, direct surveys have been the prominent method to estimate public mood and investor sentiment. However, explicit expressions can be manipulated incorrectly. It cannot take behavior based indicators into consideration.
J. Bollen and H. Mao, Twitter Mood as a Stock Market Predictor, Computer, vol. 44, no. 10, pp. 91-94, 2011.
General Methodology for Stock predictionData SourcesRelevant DatasetData Preprocessing-Text Filter-Text Normalization-Noise Removalvia APIsFeature ExtractionFeaturesTopic ModelingSentiment AnalysisTweet FeaturesClassifiersTraining DataResultsCorrelation / Prediction Capability Testing
Data SourcesTwitter (Asur and Huberman 2010; Bollen and Mao 2011; Zhao and Mei 2013; Arias et al. 2015)Streaming API -> collect real-time tweetsSearch API -> search and collect historical tweets one week in pastYahoo Finance (Nguyen et al. 2015)Collect historical stock pricesCollect posts from Yahoo Finance BoardSina Weibo (Liu et al. 2015)Microblogging service from China which is similar to Twitter
Filter Relevant Data from Corpus Collect data from social network contain both relevant and non-relevant data to our specific domainWe need to filter only relevant dataSome approaches are used in the researchesFilter by keywords -> exploit hashtag or cash tag in the messages Apply LDA to do topic modeling and then filter only related topics (Arias et al. 2015)
M. Arias, A. Arratia, and R. Xuriguera, Forecasting with Twitter Data, ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.
Text NormalizationPrimary step to refine the data. It can involve tasks.
Stop word removalPunctuation removalLowercase conversionCompressingTransform Haaappyyyy to Happy . This is done in multiple iterations, finally validated with the dictionary lookup at the end.
Noise Removal in tweetsNoise data removing has standard tools to remove highly weighted and frequent terms with IDF. Named entity recognition (NER) system - Initially, it was built to figure out if tweet contains name entities related to companies(or other feature) based on conditional random fields (CRF) model. If the Tweet doesnt have any named entities from keyword list for the company, it is removed.
Cluttered InformationRefined formFeature Extraction
Some researches use topics of the messages to be features for forecasting modelMany approaches are proposed for topic extractionExtract n-gram (unigrams or bigrams)Latent Dirichlet Allocation (LDA)Joint Sentiment-Topic (JST) -> to extract both sentiment information and topics from text data simultaneouslyAspect-based sentiment -> to extract topics first and then calculate sentiment scores concerning the distance between topics and emotion words / the importance of each topic (Nguyen et al. 2015)Topic Modelling
To extract topics first and then calculate sentiment scores concerning the distance between topics and emotion words / the importance of each topic (Nguyen et al. 2015)Aspect-based sentiment algorithm
Algorithm for extracting topics from datasetAlgorithm for extracting topics and their sentiment valuesT. H. Nguyen, K. Shirai, and J. Velcin, Sentiment analysis on social media for stock movement prediction, Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015.
Sentiment AnalysisSome researches consider sentiment information on social network as features for their modelThere are two ways to extract sentiment scoreUsing software to calculate sentiment scoresConstruct a classifier for sentiment classificationPopular toolsGPOMS -> categorize peoples emotions into 6 categories: calm, alert, sure, vital, kind, and happyOpinionFinder (OF) -> classify sentiment into positive or negative feelings
Constructing Sentiment ClassifierHave experts to annotate sentiment data and use them as training dataExtract features from training data -> n-gram, POS taggingUse classifier (SVM, Linear Regression Model) to learn from training dataApply the classifier to entire collection
Extracting Sentiment Features After having classified sentiment data, we can generate sentiment features in various waysExample of sentiment features used in some researches.Average daily sentiment scoreSentiment index = Numbers of positive tweets / Total numbers of tweetsPNRatio = Numbers of positive tweets / Numbers of negative tweetsSentiment polarity = (ptw - ntw) / (ptw + ntw)ptw : numbers of positive tweetsntw : numbers of negative tweets
Sentiment Features TestingTo ensure that sentiment information reflect the real-world events and can be used for predictionSome approaches used in researches (Bollen and Mao 2011)Causality testing : to test correlation between sentiment information and stock market price (DJIA / VIX)Self-organizing fuzzy neural network (SOFFN) : to test prediction capability of sentiment information
J. Bollen, and H. Mao, Twitter Mood as a Stock Market Predictor, Computer, vol. 44, no. 10, pp. 91-94, 2011.
Extracting Tweet Features
Some useful quantifiable information out of corpus.Number of followers of the company or the famous personality tweeting about the company (typical problem of mapreduce framework)Tweet volume (related to a specific identity or hashtag)Retweet volume (related to a specific hashtag coupled with an identity)Tweet-rate = Numbers of tweets / Duration for generating those tweetsTweet length
Prediction Model ConstructionCombine features from previous stepTopic featuresSentiment featuresTweet featuresStock historical price features (additional features)
Google Heat Map:
Gives the fair idea of any form of concentrated information by the geography. Eg, Facebook trends
Iterative Training & Validation2. Train the classifier -> SVM, Linear Regression, Neural Networks
3. Test and evaluate the model Most popular method for this is windowing mechanism, where model segregates tweets in a window (w1) spanning over days and analyses their sentiments or features.Then in the subsequent window(w2) of 1-2 days, stock indices are measured.Then, w1 & w2 are formally analyzed together to find interesting patterns.
Correlation of sentiments & indicesThis involve formally casually correlating social network sentiments and stock market indices from Dow Jones, NASDAQ, NYSE, VIX
M. Arias, A. Arratia, and R. Xuriguera, Forecasting with Twitter Data, ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.T. H. Nguyen, K. Shirai, and J. Velcin, Sentiment analysis on social media for stock movement prediction, Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015.
ConclusionInformation on social network reflect the real-world eventsSocial network data can be used to predict stock market movement at certain degreeThe knowledge extracted from social media can be applied to different applications Individual stock price predictionPredicting box-office revenue of a moviePresidential/Senate election prediction based on campaigning data.
Future WorksTry to work on longer duration dataset -> some current works use only 15 transaction datesCombining information from different data sources might improve prediction accuracy -> we know that Twitter contain many noise dataCome up with new features, such as the credibility of tweets. -> most of current researches focus on topic + sentiment without concerning about reliability of data
References M. Arias, A. Arratia, and R. Xuriguera, Forecasting with Twitter Data, ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015. L. Liu, J. Wu, P. Li, and Q. Li, A social-media-based approach to predicting stock comovement, Expert Systems with Applications, vol. 42, no. 8, pp. 3893-3901, 2015. T. H. Nguyen, K. Shirai, and J. Velcin, Sentiment analysis on social media for stock movement prediction, Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015. S. Asur, B