data mining of informational stream in social networks

19
Data Mining of Informational Data Mining of Informational Stream Stream in Social Networks in Social Networks Forecasting of Social, Market Forecasting of Social, Market and Financial Trends and Financial Trends Bohdan Pavlyshenko e-mail: [email protected] blog: bpavlyshenko.blogspot.com

Upload: bohdan-pavlyshenko

Post on 25-Jan-2015

2.292 views

Category:

Business


0 download

DESCRIPTION

Data Mining of Informational Stream in Social Networks Forecasting of Social, Market and Financial Trends

TRANSCRIPT

Page 1: Data Mining of Informational Stream in Social Networks

Data Mining of Informational Stream Data Mining of Informational Stream in Social Networks in Social Networks

Forecasting of Social, Market Forecasting of Social, Market and Financial Trends and Financial Trends

Bohdan Pavlyshenkoe-mail: [email protected]: bpavlyshenko.blogspot.com

Page 2: Data Mining of Informational Stream in Social Networks

Used technologies: R, Python, Java, Hadoop/MapReduce/Pig/Hive

The prototypes of data mining systems are based on the theory of formal concept analysis and on the theory of frequent itemsets. Using a model of a semantic concept lattice makes it possible to analyze semantically related sets of words and to construct association rules.

The use of quantitative characteristics of informational streams for marketing trend forecasting and for the analysis of users’ attitude towards different goods and services (Opinion Mining)

Detection of predictive potential of association rules in informational streams and the use of these rules in autoregressive models (ARIMA, VAR) for predicting, in particular, the financial trends on stock markets. Such a model takes into account both the past behavior of financial time row of a company and the time dynamics of quantitative characteristics of association rules.

Page 3: Data Mining of Informational Stream in Social Networks

The analysis of communities and their leaders who form analyzed trends in social networks. The analysis of the presence of manipulative formation of users’ attitude towards this or that commodity or economic trend.

The causality analysis on the basis of Granger tests for singling out the principal and subordinate time rows, particularly for informational streams, economic indicators, etc.

The creation of a subsystem of recommendations for users. For example, in an online store, this system analyzes users’ behavior, their purchases, their feedback towards goods or services. Based on the user’s activity, one can create his/her semantic profile and then make various offers to this user, taking into account his/her activity and the decisions of users with similar profiles. Such an approach may shorten significantly the time the user spent while searching goods and services, and give him/her unknown but necessary offers, revealed on the basis of other similar users’ activities.

Page 4: Data Mining of Informational Stream in Social Networks

The analysis of financial tweets

The package “Tweet Miner for Stock Market”

Page 5: Data Mining of Informational Stream in Social Networks

The formation of keyword frequent sets with the biggest support value

The analysis of financial tweets

Page 6: Data Mining of Informational Stream in Social Networks

The analysis of financial tweets

The analysis of causal relationship between the frequent sets in tweets and Apple stock prices. The results obtained show that it is possible to predict stock prices on the basis of data mining of informational streams in social networks.

Page 7: Data Mining of Informational Stream in Social Networks

test 1Granger causality testModel 1: V3 ~ Lags(V3, 1:1) + Lags(V2, 1:1)Model 2: V3 ~ Lags(V3, 1:1)Res.Df Df F Pr(>F) 1 87 2 88 -1 10.05 0.002103 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

test 2Granger causality testModel 1: V2 ~ Lags(V2, 1:1) + Lags(V3, 1:1)Model 2: V2 ~ Lags(V2, 1:1)Res.Df Df F Pr(>F)1 87 2 88 -1 0.3261 0.5694

The analysis of financial tweets

Granger causality test between quantitative characteristics of tweets and Apple stock prices.

Forecasting based on ARIMA model

Forecasting based on VAR model

Page 8: Data Mining of Informational Stream in Social Networks

The examples of the studies of semantic concepts in Twitter messages

Page 9: Data Mining of Informational Stream in Social Networks

The Final Olympic Tennis Tournament (2012)

The examples of the studies of semantic concepts in Twitter messages

Page 10: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

Before the Eurovision 2013 final we published our forecasting of a winner and the favorites in our blog. Later on it proved to be correct.

The prediction of Eurovision 2013 favorites

Page 11: Data Mining of Informational Stream in Social Networks

The analysis of travel trends

The examples of test studies of semantic concepts in Twitter messages

Travel trends

Page 12: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

The analysis of travel trendsTravel trends

Page 13: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messagesMarket analysis of iPhone concept

Page 14: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

Market analysis of iPhone concept

Page 15: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

In this work, we analyze the existence of possible correlation between public opinion of twitter users and the decision-making of persons who are influential in the society. We carry out this analysis on the example of the discussion of probable name of the British crown baby, born in July, 2013. In our study, we use the methods of quantitative processing of natural language, the theory of frequent sets, the algorithms of visual displaying of users' communities. We also analyzed the time dynamics of keyword frequencies. The analysis showed that the main predictable name was dominating in the spectrum of names before the official announcement. Using the theories of frequent sets, we showed that the full name consisting of three component names was the part of top 5 by the value of support. It was revealed that the structure of dynamically formed users' communities participating in the discussion is determined by only a few leaders who influence significantly the viewpoints of other users.

The prediction of Royal baby’s name

Page 16: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

Royal baby’s name forecasting

The name George was dominating in the spectrum of names before the official announcement.

Page 17: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

Royal baby’s name forecasting

10 first frequent sets were created by five names, the three of which are the components of Prince’s full name George Alexander Louis.

Page 18: Data Mining of Informational Stream in Social Networks

The examples of test studies of semantic concepts in Twitter messages

The Royal baby’s name forecasting

Users’ societies, which formed the discussion trends.

Page 19: Data Mining of Informational Stream in Social Networks

More test examples and studies are in my blog http://bpavlyshenko.blogspot.com

Bohdan Pavlyshenko,Ph.D., e-mail: [email protected]

Thank you for your attention!