understanding and defending against malicious

2. Understanding Crowdturfing1. Malicious Crowdsourcing

3. Defense: Machine Learning Classifiers Adversarial Machine Learning

Understanding and Defending Against Malicious CrowdsourcingUniversity of California at Santa Barbara

More accurate classifiers can be more vulnerable

Ben Y. Zhao (PI), Haitao Zheng (CoPI), Gang Wang (PhD student),

SANDLabhttp://[email protected]

[1] G. WANG, T. WANG, H. ZHENG, B. ZHAO. Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In Proc. of Usenix Security (2014)[2] G. WANG, C. WILSON, X. ZHAO, Y. ZHU, M. MOHANLAL, H. ZHENG, B. ZHAO. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW (2012)

Summary

New Threat: Malicious Crowdsourcing = Crowdturfing + Hire a large group of real Internet users for malicious attacks + Fake reviews, rumors, targerted spam + Most existing defenses failed against real users (e.g., CAPTCHA)

Research Questions + How does crowdturfing work? [1]

+ What’s the scale, economics and impact of crowturfing campaigns? [1] + How to defend against crowdturfing? [2]

Crowdturfing Sites + Web services that recruit Internet users as workers (spam for $) + Connect workers to customers who want to run malicious campaigns

Key Players + Customers: pay to run a campaign + Workers: real users, spam for $ + Target Networks: social networks, revew sites

Scale and Revenue + Measurements of two largest crowdturfing sites (in China) - ZBJ (zhubajie.com), five years - SDH (sandaha.com), two yeras + 18.5M tasks, 79K campaigns, 180K workers + Millions dollars of revenue per month

Crowdturfing around the World

ZBJ, SDH Fiverr, Freelancer, MinuteWorkers, Myeasytasks, Microworkers, Shorttasks Paisalive

Machine Learning (ML) vs. Crowdturfing+ Simple method does not work on real users (e.g., CAPTCHA, rate limit)+ Machine learning: more sophiscaed modeling on user behaviors+ Perfect context to study adversarial machine learning - Human workers are adaptive to evade classifiers - Crowdturf admins can temper with training data by chaning worker behaviors

How Effective is ML-based Detecor?+ Groundtruth: 28K workers in crowdturfing campaigns on Weibo (Chinenes Twitter)+ Baseline users: 371K Weibo user accounts+ 30 behavioral features+ Classiiers: Random Forest, Decision Tree, SVM, Naive Bayes, Bayesian Network

0%10%20%30%40%50%60%

RF Tree SVMr SVMp NB BN

False Positive RateFalse Negative Rate

+ Random Forest is the most accurate (95% accuracy) + 99% accuracy on professional workers (>100 tasks) + How robust are those classifiers?

Classifier

Training Data

Traininge.g. SVM

Poisning Attack

Evasion Attack

+ Evasion attack: individual workers change behaviors to evade the detection - Impact: single feature-change saves 95% of workers

+ Poisoning attack: site admins tamper with training data to mislead classifier training

+ Machine learning classifiers are effective against current crowd-workers+ Classifiers are highly vulnerable to adversarial attacks. Future works will focus on improving the robustiness of ML-classifiers

Example: Poisoning Attack+ Inject mislabeled samples to training data wrong classifier e.g., inject benign accounts as “workers” in training data + Uniformly change workers behavior by enforcing task policies hard to train an accurate classifier

0

5

10

15

20

0 0.2 0.4 0.6 0.8 1

Fals

e Po

sitiv

e Ra

te

Ratio of Injected Sample to Turfing

Tree

SVMpRF

SVMr

CustomerCrowdturfing Site Target Networks

Crowd-workers

1

10

100

1000

10000

100000

1000000

1

10

100

1000

10000

Jan. 2008 Jan. 2009 Jan. 2010 Jan. 2011

Cam

paig

ns p

er M

onth

Dolla

rs p

er M

onth

Site Growth Over Time

CampaignsCampaigns

$

$ZBJ

SDH

Model Training

Detection

understanding and defending against malicious

Documents