#iccss2015 - computational human security analytics using "big data"

Computational Human Security Analytics using “Big Data”

Pete Burnap & Matt Williams Social Data Science Lab

School of Computer Science and Informatics & School of

Social Sciences Cardiff University

@pbFeed @mattlwilliams

@socdatalab

COSMOS Web Observatory – cosmosproject.net

Integrated

Open (“plug and play”)

Scalable (MongoDB data stores/ Hadoop Back End)

Burnap, P. et al. (2014) ‘COSMOS: Towards an Integrated and Scalable Service for Analyzing Social Media on Demand’, International Journal of Parallel, Emergent and Distributed Systems

Usable – developed with social scientists for social scientists

Reproducible/Citable Research

- export/share workflow

Web Observatory Features

•  Data Collection and Curation

–  Persistent connection to Twitter 1% Stream (~4 billion)

– Geocoded tweets from UK (~200 million annually)

–  Bespoke keyword-driven Twitter collections (on crime and security)

–  ONS/Police API

– Drag and drop RSS

–  Import CSV/JSON

–  …Web enabled so push/pull data from anywhere (i.e. other observatories!)

Web Observatory Features

•  Data Transformation

– Word Frequency

–  Point data frequency over time

–  Social Network Analysis

– Geospatial Clustering

–  Sentiment Analysis

– Demographic Analysis (gender, location, age, occupation/social class) (Sloan et. al, 2015 PloS One)

–  …API to plug new modules and benchmark tools…plus transform data via other observatories

Supervised Machine Learning & Cyber Hate Speech

•  Numerous instances in the hate speech human annotated sample of calls for collective action and hateful incitement towards social groups exhibiting protected characteristics.

•  For instance, there were exclamations such as “send them home”, “get them out”, and “should be hung”

•  Implemented the Stanford Lexical Parser, along with a context-free lexical parsing model, to extract typed dependencies within the tweet text (Marneffe et al., 2006).

•  Typed dependencies provide a representation of grammatical relationships in a sentence (or tweet in this case) that can be used as features for classification.

“Totally fed up with the way this country has turned into a haven for terrorists. Send them all back home”.

•  [root(ROOT-0, Send-1), nsubj(home-5, them-2), det(home-5, all-3), amod(home-5, back-4), xcomp(Send-1, home-5)]

•  Linguistically therefore, the term ‘them’ is associated with ‘home’ in a relational sense. Sociologically, this is an “othering” phrase

•  Combination of linguistics and sociology potentially provides a very interesting set of features for the more nuanced classification of hate speech beyond BoW approach

Machine Classification Results

! !BLR!!

!RFDT!

!SVM!

!Voted!Ensemble!(Max!Probability)!

P! R! F! P! R! F! P! R! F! P! R! F!!nGram!words!(1@5)!with!2000!features!!

0.76!FP=46!

0.67!FN=74! 0.71!

0.76!FP=38!

0.55!FN=99! 0.64!

0.80$FP=38$

0.69!FP=69! 0.74$

0.73!FP=58!

0.71$FN=65$ 00.72!

!nGram!Hateful!Terms!!

0.89$FP=19$

0.66$FN=75$

0.76$ 0.89$FP=19$

0.66$FN=75$

0.76$ 0.89$FP=19$

0.66$FN=75$

0.76$ 0.89$FP=19$

0.66$FN=75$

0.76$

!nGram!Reduced!Typed!Dependencies!+!Hateful!Terms!

0.89$FP=19$

0.69$FN=70$

0.77$0.89$FP=19$

0.68!FN=71!

0.77$0.89$FP=19$

0.69$FN=70$

0.77$0.89$FP=19$

0.69$FN=70$

0.77$

Burnap, P. and Williams, M. (2015) ‘Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making’, Policy & Internet (7:2)

Theory-driven Experimental Design

•  Modeling the spread of cyber hate following a national security event

–  Does cyber hate get propagated? (size)

–  Does cyber hate continue to be propagated for long time? (survival)

•  Study of the process of action, reaction and amplification (Cohen 1972)

•  Moral panics: process of impact, inventory and reaction

•  Social response is partly responsible for deepening impact of event then SM reactions act as a force amplifier

Impact

Impact “during which the disaster strikes and the immediate unorganised response to the death, injury and destruction takes place”: Initial reaction and diffusion on SM

Inventory

Inventory “during which those exposed to the disaster begin to form a preliminary picture of what has happened and of their own condition”: Diffusion of rumour and hate on SM

Reaction

Reaction “images in the inventory were crystallized into more organised opinions and attitudes”: Diffusion of wider issues on SM – immigration, religion, security etc.

Size Results

-100 0 100 200 300 400 500 600 700

Far Right Political

Political

Police

Media

Cyberhate

News (per 100 stories)

Google (per 100 searches)

Sentiment

URL

Hashtag

Increased likelihood of retweet (all p < 0.05)

Survival Results 0.

000.

250.

500.

751.

00

0 200000 400000 600000 800000 1000000Analysis Time (Seconds)

No Cyberhate Moderate CyberhateExtreme Cyberhate

Kaplan−Meier Survival Estimates for Tweets Containing Cyberhate

0.00

0.25

0.50

0.75

1.000 200000 400000 600000 800000 1000000

Analysis Time (Seconds)

News Agent Police AgentPolitical Agent Far Right Political AgentOther Agent

Kaplan−Meier Survival Estimates for Tweet Agent Type

References

Williams, M. L. and Burnap, P. (2015) ‘Cyberhate on social media in the aftermath of Woolwich: A case study in computational criminology and big data. British Journal of Criminology Burnap, P. and Williams, M. (2015) ‘Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making’, Policy & Internet (7:2) Burnap, P., Williams, M.L. et al. (2014), ‘Tweeting the Terror: Modelling the Social Media Reaction to the Woolwich Terrorist Attack’, Social Network Analysis and Mining (4:2 )

#iccss2015 - computational human security analytics using "big data"

Data & Analytics

pushpull data

social media

social groups

cosmos web observatory

typed dependencies

interesting set of features

machine classification

end burnap