predicting the future with social media

72
What The Future Holds For Social Media Data Analysis Predictive analytics using Twitter data Peter Wlodarczak [email protected]

Upload: peter-wlodarczak

Post on 11-Apr-2017

374 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Predicting the future with social media

What The Future Holds For Social

Media Data Analysis

Predictive analytics using Twitter data

Peter Wlodarczak [email protected]

Page 2: Predicting the future with social media

Agenda

Introduction

Research methodology

Applications

Challenges

Conclusions

Page 3: Predicting the future with social media

Introduction I

Shift from publisher-generated to user-

created content

90% of the content on the Internet is now

user generated (Graham et al. 2011)

Unprecedented amount of opinionated

data on the Internet

Online social networks (OSN) are one of

the biggest data sources of the internet

(Oboler, Welsh & Cruz 2012)

Page 4: Predicting the future with social media

Introduction II

Opinions can be expressed on the

Internet without programming knowledge

(Web 2.0)

Opinions are key influences of human

behavior

People increasingly consult the Internet

before making decisions

Page 5: Predicting the future with social media

Introduction III

OSN give new insights into peoples

opinions, interests and views Social networking Web sites are amassing vast

quantities of data

Computational social science is providing tools to

process this data (Oboler, Welsh & Cruz 2012)

Social computing, a new paradigm of computing

and technology development, has become a

central theme across a number of information and

communication technology fields (Wang et al.

2007, p. 79)

Page 6: Predicting the future with social media

Introduction IV

Growing interest in Social Media Mining

(SMM) in the market Gnip, Klout, DataSift and Sprout social specialized

in SM data analysis

Apple bought Topsy for 200 million US dollars

(Harris 2013)

TV stations buy Facebook data to see how

popular their shows are (Rusli 2013)

No surveys necessary

Page 7: Predicting the future with social media

Introduction V

Research in the area of computational

social science and Big Data Social computing is a cross-disciplinary research

and application field with theoretical underpinnings

including both computational and social sciences

(Wang et al. 2007, p. 80)

Big Data is the ability of society to harness

information in novel ways to produce useful

insights or goods and services of significant value

(Mayer-Schonberger & Cukier 2013, p. 2)

Page 8: Predicting the future with social media

Introduction VI

Analyzing data to:

Understand the underlying structure of it

and gain knowledge

Make predictions from new, unseen

examples

Page 9: Predicting the future with social media

Introduction VII

Current behavior indication for future

decisions

New area of research: predictive

analytics

Machine learning techniques used for

prediction

Learning from experience, “data”, to predict

future behavior of individuals

Support decision making process

Page 10: Predicting the future with social media

Introduction VIII

Big Data

Big Data is usually defined by the three

V’s. Volume, velocity and variety (Klein,

Tran-Gia & Hartmann 2013, p. 320)

High volume

Created at high velocity

Structured, semi-structured and unstructured

Page 11: Predicting the future with social media

Introduction IX

Big Data principles

No sample selection, all data analysed

Data doesn’t have to be of high quality

Structured and unstructured data

Page 12: Predicting the future with social media

Introduction X

Data mining

Techniques for finding and describing

structural patterns in data

Tool for helping to explain that data and

make predictions from it (Witten, Frank &

Hall 2011, p. 8)

Used to

gain knowledge

make predictions

Page 13: Predicting the future with social media

Introduction XI

Data analysis steps

Analyze mood by means of sentiment

analysis

Create time series and correlate it to real

world phenomenon

Make predictions based on new data

Support decision making process

Page 14: Predicting the future with social media

Introduction XII

Social Media data has been analysed to

predict

Financial indicators (Bollen, Mao & Zeng

2010)

Elections (Tumasjan et al. 2011)

Box office revenue (Asur & Huberman 2010)

Disease outbreak (Achrekar et al. 2011)

Natural disasters (Sakaki, Okazaki and

Matsuo 2010)

Page 15: Predicting the future with social media

Research methodology I

Predictive analysis of Social Media

consists of two phases

Data conditioning phase

Predictive analysis phase

Page 16: Predicting the future with social media

Research methodology II

Determination of time window

Selection of search terms

Selection of data extraction method

Collection and

filtering of raw

data

Selection of prediction variables

Measurement of predictor variables

Computation

of Predictor

Variables

Data Conditioning

Phase

Selection of predictive method

Identification of data for evaluation of prediction

Creation of

Predictive

Mode

Selection of the evaluation method

Specification of the prediction baseline

Evaluation of the

Predictive

Performance

Predictive Analysis

Phase

Analysis phases

Page 17: Predicting the future with social media

Research methodology III

Input and output variables

Twitter sentiments

Share priceFuture

share price

Expressed as binary

sentiment

classification

Expressed in

dollars

Expressed in

dollars

Page 18: Predicting the future with social media

Research methodology IV

Mood towards

Apple

Number of

Tweets

Apple stock

price

Page 19: Predicting the future with social media

Data collection and analysis overview

Data collection

•Query Twitter through API

•Store in MongoDB

Preprocessing

•Remove stopwords

•Remove Tweets withLinks

Model evaluation

•Classificationalgorithm

•Neuralnetwork

Time series

•Twitter volume

•Binary sentimentclassification

Correlation

• Correlationbetweensentiment andfinancial data

Collection and analysis steps overview

Some steps like model evaluation are

iterative

Page 20: Predicting the future with social media

Data collection I

Data collection

•Query Twitter through API

•Store in DB

Preprocessing

•Remove stopwords

•Remove Tweets withLinks

Model evaluation

•Classificationalgorithm

•Neuralnetwork

Time series

•Twitter volume

•Binary sentimentclassification

Correlation

• Correlationbetweensentiment andfinancial data

DB

Page 21: Predicting the future with social media

Data collection II

Data Source

Twitter

Query API

Firehose API

Gardenhose API

Data Store

MongoDB

Historic data collected through Twitter

APIs

Timestamp, message text, region

Page 22: Predicting the future with social media

Data collection III

Data collected through Twitter query

API

Using the Java programming language

Using the Twitter4j library

Stored as JSON (JavaScript Object

Notation) in a MongoDB

Page 23: Predicting the future with social media

Data collection IV

public void runQuery() {

Twitter twitter = new TwitterFactory().getInstance();

AccessToken accessToken = new AccessToken(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);

twitter.setOAuthConsumer(CUSTOMER_KEY, CUSTOMER_SECRET);

twitter.setOAuthAccessToken(accessToken);

try {

Query query = new Query(“$Appl");

QueryResult result;

result = twitter.search(query);

List<Status> tweets = result.getTweets();

for (Status tweet : tweets) {

System.out.println("@" + tweet.getUser().getScreenName() + " - " + tweet.getText());

}

}

catch (TwitterException te) {

te.printStackTrace();

System.out.println("Failed to search tweets: " + te.getMessage());

System.exit(-1);

}

}

Twitter query algorithm to retrieve Tweets on Apple

Page 24: Predicting the future with social media

Data preprocessing I

Data collection

•Query Twitter through API

•Store in DB

Preprocessing

•Remove stopwords

•Remove Tweets withLinks

Model evaluation

•Classificationalgorithm

•Neuralnetwork

Time series

•Twitter volume

•Binary sentimentclassification

Correlation

• Correlationbetweensentiment andfinancial data

Page 25: Predicting the future with social media

Data preprocessing II

Remove stop-words, “the”, “then”, “at” …

Punctuation, apostrophe, brackets, colon ..

Discard Tweets with no explicit statements

like “Going to the Apple store”

Discard irrelevant Tweeds like “I love apples

and pears”

Discard possible spam by discarding Tweets

that match the regular expression “http:” and

“www”

Page 26: Predicting the future with social media

Data preprocessing III

Machine learning algorithms don’t take text

as input

Create feature vector

Word frequencies

n-grams, unigram, bigram, trigram …

“good”, “very good”, “not very good”

Create sentiment lexicon

Sentiment analysis highly domain specific

“This mattress had a valley after one month”

“This car uses a lot of fuel”

Page 27: Predicting the future with social media

Model evaluation I

Data collection

•Query Twitter through API

•Store in DB

Preprocessing

•Remove stopwords

•Remove Tweets withLinks

Model evaluation

•Classificationalgorithm

•Neuralnetwork

Time series

•Twitter volume

•Binary sentimentclassification

Correlation

• Correlationbetweensentiment andfinancial data

90.2 %

84.7 %

97.3 %

Neural Network

Naïve Bayes

Nearest Neighbor

Page 28: Predicting the future with social media

Model evaluation II

Experience shows that no single machine

learning scheme is appropriate to all data

mining problems (Witten, Frank & Hall 2011,

p. 403)

Different algorithms are trained

The best performing algorithm will be

selected

Page 29: Predicting the future with social media

Model evaluation III

Data classification and analysis through

Machine learning techniques

System can learn from data, e. g. detect spam

Finding and describing structural patterns in

data and generalize

Data classification is a supervised

learning problem

Class label is known

Page 30: Predicting the future with social media

Model evaluation IV

Other machine learning models are

Unsupervised learning

Class label is unknown

Used for cluster analysis

Semi-supervised learning

Small amount of labeled data, big volumes of

unlabeled data

Page 31: Predicting the future with social media

Model evaluation V

Model evaluation through iterative supervised

machine learning process

Select classification algorithm, Naïve Bayes, k-

NN, Decision tree induction …

Find a function ƒ that classifies Tweets into

positive and negative Tweets

Data is divided into training and test data

Model is trained using the training data

Trained model is verified using the test data

Page 32: Predicting the future with social media

Model evaluation VI

Determine through loss function how well the

model performs on future, unseen data

Calculate error: Training error = fraction of training examples misclassified

Test error = fraction of test examples misclassified

Generalization error = probability of misclassifying new

random example

Page 33: Predicting the future with social media

Model evaluation VII

Testing determines the classification

accuracy

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠

𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑐𝑎𝑠𝑒𝑠

Simple but very optimistic since training data

is used for testing

Page 34: Predicting the future with social media

Model evaluation VIII

n-fold cross-validation Divide data into n folds, where typically 4 < n < 11

Data divided randomly into n folds

n – 1 folds used for training, 1 holdout fold for

testing

Error rate is calculated on the holdout fold

repeated n times such that each fold is the holdout

fold once

Error estimate is averaged over all n error rates

Page 35: Predicting the future with social media

Model evaluation IX

Typical data mining task goes through many

iterations

As many iterations as necessary till result is

satisfying, i. e. accuracy converges

Best data mining scheme is selected

Used against unseen data for classification

Can be used on real-time data

Page 36: Predicting the future with social media

Model evaluation X

RapidMiner workbench

Page 37: Predicting the future with social media

Model evaluation XI

Training data

sex mask cape tie ears smokes class

Batman male yes yes no yes no Good

Robin male yes yes no no no Good

Alfred male no no yes no no Good

Penguin male no no yes no yes Bad

Catwoman female yes no no yes no Bad

Joker male no no no no no Bad

Test data

Batgirl female yes yes no yes no ?

Riddler male yes no no no no ?

Page 38: Predicting the future with social media

Model evaluation XII

Description of data:

Generalisation for new examples

if sex = male and mask = yes and cape = yes

and tie = yes and ears = yes and smokes = no

then character = Good

if mask = yes and ears = yes and smokes = no

then character = Good

Page 39: Predicting the future with social media

Model evaluation XIII

tie

no yes

cape smokes

no yes no yes

bad badgood good

Page 40: Predicting the future with social media

Model evaluation XIV

Trees must be:

Big enough to fit training data

Big enough to capture true patterns

Not too big (Ockham’s razor):

Overfitting

Capture noise

Find spurious patterns

Page 41: Predicting the future with social media

Model evaluation XIV

Best tree size cannot be determined

from training error

Schapire 2004

Page 42: Predicting the future with social media

Model evaluation XV

Schapire 2004

Page 43: Predicting the future with social media

Model evaluation XVI

For building an accurate classifier:

Enough training examples

Good performance on training set

Classifier that is not too complex

Strategy for controlling tree size:

Build large tree that fully fits training data

Prune back

Page 44: Predicting the future with social media

Model evaluation XVII

Grow on just part of the training data, then

prune using minimum error on held out

data

Page 45: Predicting the future with social media

Classifiers I

Decision trees:

Best known:

C4.5 (Quinlan), successor C5.0

CART for classification and regression trees

(Breitman et al.)

Fast to train and evaluate

Relatively easy to interpret

Accuracy often not satisfactory

Page 46: Predicting the future with social media

Classifiers II

Perceptron (Neuron)

Linear classifier

Data linearly separable using a hyperplane

Where w = weights, a = real-valued vector,

feature vector, a0 = bias

Binary classifier f(a) that maps its input

vector a to a single, binary output value

w0a0 + w1a1 + w2a2 + … + wkak = 0

Page 47: Predicting the future with social media

Classifiers III

w0

1

bias

attr

a1

attr

a2

attr

a3

w1 w2

w3

f(a) = kwkak + b

f(a) > 0 or

f(a) < 0

Page 48: Predicting the future with social media

Classifiers IV

Multilayer Perceptron

Non-linear classifier

Perceptrons are connected in a

hierarchical structure

Page 49: Predicting the future with social media

Classifiers V

Not all data is linearly separable

Page 50: Predicting the future with social media

Classifiers VI

1

bias

attr

a1

attr

a2

Input layer Hidden layer Output layer

Page 51: Predicting the future with social media

Classifiers VII

Multilayer Perceptron

Perceptrons organized in several layers

All layer is fully interconnected with the next

layer

All nodes except input node are perceptrons

Feedforward neural network

Uses backpropagation for training

Error propagated back to minimize loss function

Page 52: Predicting the future with social media

Classifiers VIII

Allows to get approximate solutions for

very complex problems

Support Vector Machines (SVM) are a

much simpler alternative to ANN

Many more classifiers

k-Nearest Neighbor

Naïve Bayes

Page 53: Predicting the future with social media

Data classification I

Data collection

•Query Twitter through API

•Store in DB

Preprocessing

•Remove stopwords

•Remove Tweets withLinks

Model evaluation

•Classificationalgorithm

•Neuralnetwork

Time series

•Twitter volume

•Binary sentimentclassification

Correlation

• Correlationbetweensentiment andfinancial data

Page 54: Predicting the future with social media

Data Classification II

Data classification:

Binary mood polarity: positive, negative

Represented graphically as time series

Positive Tweets

Negative Tweets

Page 55: Predicting the future with social media

Correlations I

Data collection

•Query Twitter through API

•Store in DB

Preprocessing

•Remove stopwords

•Remove Tweets withLinks

Model evaluation

•Classificationalgorithm

•Neuralnetwork

Time series

•Twitter volume

•Binary sentimentclassification

Correlation

• Correlationbetweensentiment andfinancial data

Sentiment polarity

Share price

Page 56: Predicting the future with social media

Correlations II

Finding correlations:

Binary sentiment classification time series

compared against stock price over same

time frame

Does the number of positive Tweets

preceding a soar of Apple stock price?

Page 57: Predicting the future with social media

Correlations III

Microsoft stock price (Yahoo! Finance 2014)

Page 58: Predicting the future with social media

Correlations IV

Tweet polarity and MSFT stock price

Page 59: Predicting the future with social media

Correlations V

If there are correlations in historic data,

trained model used against real time

data

Access real time Tweets using Twitters

streaming API

Firehose API (100% of real time Tweets)

Gardenhose API (10% of real time Tweets)

Spritzer API (1% of real time Tweets)

Page 60: Predicting the future with social media

Correlations VI

Since correlations are most certainly non

linear, correlating has to be automated

Bivariate Granger causality test

Determine whether one time series can be

used to predict another

If X in a time series causes Y = Granger-

cause

X provides statistical significant information

about Y

Page 61: Predicting the future with social media

Correlations VII

Granger test examines linear causality

among bivariate or multivariate time series

Many real world phenomenon are not

linear

Non-linear extensions to Granger have

been developed

Other correlation techniques

Phase Slope Index measures temporal flux

between time series

Page 62: Predicting the future with social media

Correlations VII

More robust than Granger since more

immune against noise

Machine learning techniques such as

ANN can be used for finding

correlations

Page 63: Predicting the future with social media

Applications I

Technologies for predictive analysis

have matured

IBM SPSS

Stata

SAS

Page 64: Predicting the future with social media

Applications II

Free open source

WEKA

Partly open source

RapidMiner

Cloud solutions

IBM WatsonAnalytics

Google BigQuery

SAS Cloud Analytics

Page 65: Predicting the future with social media

Challenges I

Real word data often very poor quality

Social Media vast, noisy and

unstructured

Getting relevant posts is challenging

Spam has become a serious issue

Detecting sarcasm very difficult

Political opinions full of irony and sarcasm

Data preprocessing one of the most

important steps

Page 66: Predicting the future with social media

Challenges II

Opinion mining remains challenging

task

Overall statement often difficult to

determine

No ground truth

Not everybody is using Social Media

Self-selection bias

Page 67: Predicting the future with social media

Conclusions I

Predictive analysis poses many

interesting research problems

Many opportunities for future research

Determining the credibility of posts (catfish,

sock puppet)

Better filtering mechanisms

More research in Machine Learning

than feature extraction

Page 68: Predicting the future with social media

Conclusions II

Correlation does not mean causation

Finding causative mechanism for

correlation

Page 69: Predicting the future with social media

Thank you for the attention

Questions?

Page 70: Predicting the future with social media

References I

Achrekar, H, Gandhe, A, Lazarus, R, Ssu-Hsin, Y and Benyuan, L 2011, 'Predicting Flu Trends using Twitter data', Computer

Communications Workshops (INFOCOM WKSHPS), IEEE, pp. 702-7.

Arias, M, Arratia, A & Xuriguera, R 2014, 'Forecasting with twitter data', ACM Trans. Intell. Syst. Technol., vol. 5, no. 1, pp. 1-24.

Asur, S & Huberman, BA 2010, 'Predicting the Future with Social Media', in Web Intelligence and Intelligent Agent Technology

(WI-IAT), 2010 IEEE/WIC/ACM International Conference on, vol. 1, pp. 492-9.Berman, JJ 2013, PRINCIPLES OF BIG DATA,

Elsevier Inc., Waltham, USA.

Bollen, J, Mao, H & Zeng, X-J 2010, 'Twitter mood predicts the stock market', Journal of Computational Science, vol. 2, p. 8.

Buhl, H, Röglinger, M, Moser, F & Heidemann, J 2013, 'Big Data', WIRTSCHAFTSINFORMATIK, vol. 55, no. 2, pp. 63-8.

Bulysheva, L & Bulyshev, A 2012, 'Segmentation modeling algorithm: a novel algorithm in data mining', Information Technology

and Management, vol. 13, no. 4, pp. 263-71.

Darwish, A & Lakhtaria, KI 2011, The Impact of the New Web 2.0 Technologies in Communication, Development, and

Revolutions of Societies, vol. 2, 2011.

Goh, KY, Heng, CS & Lin, Z 2012, ‘Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of

User- and Marketer-Generated Content’, School of Computing, National University of Singapore, viewed 9 April 2013,

<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2048614>.

Graham, DM, Hale, SA & Stephens, M 2011, 'User-generated Content in Google', Oxford University, Oxford, UK, viewed 27

October 2013, < http://www.oii.ox.ac.uk/vis/?id=4e3c030d>.

Harris, D 2013, 'DataSift raises $42M', Gigaom, viewed 27 December 2013, <http://gigaom.com/2013/12/03/datasift-raises-42m-

maybe-theres-something-to-this-social-data-after-all/>.

Huang, S, Peng, W, Li, J & Lee, D 2013, 'Sentiment and topic analysis on social media: a multi-task multi-label classification

approach', paper presented to Proceedings of the 5th Annual ACM Web Science Conference, Paris, France.

Kao, A, Ferng, W, Poteet, S, Quach, L & Tjoelker, R 2013, 'TALISON - Tensor analysis of social media data', in Intelligence and

Security Informatics (ISI), 2013 IEEE International Conference on, pp. 137-42.

Klein, D, Tran-Gia, P & Hartmann, M 2013, 'Big Data', Informatik-Spektrum, vol. 36, no. 3, p. 319.

Kumar, P, Nitin, Chauhan, DS & Sehgal, VK 2012, 'Selection of evolutionary approach based hybrid data mining algorithms for

decision support systems and business intelligence', paper presented to Proceedings of the International Conference on

Advances in Computing, Communications and Informatics, Chennai, India.

Page 71: Predicting the future with social media

References II

Kumar, P, Kumar Sehgal, N, Kumar Sehgal, V & Singh Chauhan, D 2012, 'A Benchmark to Select Data Mining Based

Classification Algorithms for Business Intelligence and Decision Support Systems', International Journal of Data Mining &

Knowledge Management Process, vol. 2, no. 5, pp. 25-42.

Lim, E-P, Chen, H & Chen, G 2013, 'Business Intelligence and Analytics: Research Directions', ACM Trans. Manage. Inf. Syst.,

vol. 3, no. 4, pp. 1-10.

Manyika, J, Chui, M, Brown, B, Bughin, J, Dobbs, R, Roxburgh, C & Byers, AH 2011, Big data: The next frontier for innovation,

competition, and productivity, McKinsey Global Institute.

Mayer-Schonberger, V & Cukier, K 2013, Big Data: A Revolution That Will Transform How We Live, Work, and Think, Houghton

Mifflin Harcourt Publishing Company, New York, USA.

Mayer, A 2009, 'Online social networks in economics', Decision Support Systems, vol. 47, no. 3, pp. 169-184, viewed 22

September 2013, < http://sistemas-humano-computacionais.wdfiles.com/local--files/capitulo%3Aredes-sociais/amayer.pdf>.

McKelvey, K, Rudnick, A, Conover, MD & Menczer, F 2012, 'Visualizing Communication on Social Media, Making Big Data

Accessible', Indiana University School of Informatics and Computing, viewed 29 September 2013,

<http://arxiv.org/pdf/1202.1367v1.pdf>.

Neri, F, Aliprandi, C, Capeci, F, Cuadros, M & By, T 2012, 'Sentiment Analysis on Social Media', in Advances in Social Networks

Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on, pp. 919-26.

Oboler, A, Welsh, K & Cruz, L 2012, The danger of big data: Social media as computational social science, 2012.

Ostrowski, DA 2011, 'Predictive Semantic Social Media Analysis', in Semantic Computing (ICSC), 2011 Fifth IEEE International

Conference on, pp. 283-90.

Paltoglou, G & Thelwall, M 2012, 'Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media', ACM Trans. Intell.

Syst. Technol., vol. 3, no. 4, pp. 1-19.

Rusli, EM 2013, Facebook Woos TV Networks With Data, Digits, viewed 15 February 2014,

<http://blogs.wsj.com/digits/2013/09/29/facebook-woos-tv-networks-with-more-data/>.

Smith, MS, Ventura, AD, Dewey, DP, Knutson, CD & Embley, DW 2011, ‘A Computational Framework for Social Capital in Online

Communities’, Brigham Young University, viewed 28 July 2013, <http://posts.smithworx.com/publications/d.pdf>.

Page 72: Predicting the future with social media

References III

Yahoo! Finance 2014, Microsoft Corporation (MSFT), Yahoo, viewed 15 February 2014,

<http://finance.yahoo.com/echarts?s=MSFT+Interactive#symbol=msft;range=20130102,20140214;compare=;indicator=volume;chartty

pe=area;crosshair=on;ohlcvalues=0;logscale=off;source=; >.

Trif, S 2011, 'Using Genetic Algorithms in Secured Business Intelligence Mobile Applications', Informatica economica, vol. 15, no. 1,

pp. 69-79.

Tumasjan, A, Welpe, IM, Sandner, PG, Tumasjan, A & Sprenger, TO 2011, 'Election Forecasts With Twitter: How 140 Characters

Reflect the Political Landscape', Social science computer review, vol. 29, no. 4, pp. 402-18.

Sakaki, T, Okazaki, M and Matsuo, Y 2010, 'Earthquake shakes Twitter users: real-time event detection by social sensors', Proc. of the

19th international conference on World wide web, Raleigh.

Twitter Statistics 2014, Statistic brain, viewed 18 February 2014, <http://www.statisticbrain.com/twitter-statistics/>.

Walton, A 2014, ‘Twitter Usage by Region’, Chron, viewed 18 February 2014, < http://smallbusiness.chron.com/twitter-usage-region-

62762.html>.

Wang, F-Y, Carley, KM, Zeng, D & Mao, W 2007, 'Social Computing: From Social Informatics to Social Intelligence', Intelligent

Systems, IEEE, vol. 22, no. 2, pp. 79-83.

Weka knowledge explorer, viewed 15 February 2014, <http://www.cs.waikato.ac.nz/~ml/weka/gui_explorer.html>.

Witten, IH, Frank, E & Hall, MA 2011, Data Mining, 3 edn, Elsevier, Burlington, MA, USA.

Wlodarczak, P 2014, ‘Big Personal Data’, Social Science Research Network, <http://dx.doi.org/10.2139/ssrn.2514721>.

World Stock Exchanges 2011, viewed 18 February 2014, <http://www.world-stock-exchanges.net/top10.html>.

Wong, FMF, Sen, S & Chiang, M 2012, 'Why Watching Movie Tweets Won’t Tell the Whole Story?', Cornell University, viewed 14 May

2013, <http://arxiv.org/pdf/1203.4642v1.pdf>.

Wu, X, Kumar, V, Ross Quinlan, J, Ghosh, J, Yang, Q, Motoda, H, McLachlan, GJ, Ng, A, Liu, B, Yu, PS, Zhou, Z-H, Steinbach, M,

Hand, DJ & Steinberg, D 2007, 'Top 10 algorithms in data mining', Knowledge and Information Systems, vol. 14, no. 1, pp. 1-37.

Zeng, D, Chen, H, Lusch, R & Li, S-H 2010, 'Social Media Analytics and Intelligence', Intelligent Systems, IEEE, vol. 25, no. 6, pp. 13-

6.

Zeng, L, Li, L & Duan, L 2012, 'Business intelligence in enterprise computing environment', Information Technology and Management,

vol. 13, no. 4, pp. 297-310.