understanding and defending against malicious
TRANSCRIPT
2. Understanding Crowdturfing1. Malicious Crowdsourcing
3. Defense: Machine Learning Classifiers Adversarial Machine Learning
Understanding and Defending Against Malicious CrowdsourcingUniversity of California at Santa Barbara
More accurate classifiers can be more vulnerable
Ben Y. Zhao (PI), Haitao Zheng (CoPI), Gang Wang (PhD student),
SANDLabhttp://[email protected]
[1] G. WANG, T. WANG, H. ZHENG, B. ZHAO. Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In Proc. of Usenix Security (2014)[2] G. WANG, C. WILSON, X. ZHAO, Y. ZHU, M. MOHANLAL, H. ZHENG, B. ZHAO. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW (2012)
Summary
New Threat: Malicious Crowdsourcing = Crowdturfing + Hire a large group of real Internet users for malicious attacks + Fake reviews, rumors, targerted spam + Most existing defenses failed against real users (e.g., CAPTCHA)
Research Questions + How does crowdturfing work? [1]
+ What’s the scale, economics and impact of crowturfing campaigns? [1] + How to defend against crowdturfing? [2]
Crowdturfing Sites + Web services that recruit Internet users as workers (spam for $) + Connect workers to customers who want to run malicious campaigns
Key Players + Customers: pay to run a campaign + Workers: real users, spam for $ + Target Networks: social networks, revew sites
Scale and Revenue + Measurements of two largest crowdturfing sites (in China) - ZBJ (zhubajie.com), five years - SDH (sandaha.com), two yeras + 18.5M tasks, 79K campaigns, 180K workers + Millions dollars of revenue per month
Crowdturfing around the World
ZBJ, SDH Fiverr, Freelancer, MinuteWorkers, Myeasytasks, Microworkers, Shorttasks Paisalive
Machine Learning (ML) vs. Crowdturfing+ Simple method does not work on real users (e.g., CAPTCHA, rate limit)+ Machine learning: more sophiscaed modeling on user behaviors+ Perfect context to study adversarial machine learning - Human workers are adaptive to evade classifiers - Crowdturf admins can temper with training data by chaning worker behaviors
How Effective is ML-based Detecor?+ Groundtruth: 28K workers in crowdturfing campaigns on Weibo (Chinenes Twitter)+ Baseline users: 371K Weibo user accounts+ 30 behavioral features+ Classiiers: Random Forest, Decision Tree, SVM, Naive Bayes, Bayesian Network
0%10%20%30%40%50%60%
RF Tree SVMr SVMp NB BN
False Positive RateFalse Negative Rate
+ Random Forest is the most accurate (95% accuracy) + 99% accuracy on professional workers (>100 tasks) + How robust are those classifiers?
Classifier
Training Data
Traininge.g. SVM
Poisning Attack
Evasion Attack
+ Evasion attack: individual workers change behaviors to evade the detection - Impact: single feature-change saves 95% of workers
+ Poisoning attack: site admins tamper with training data to mislead classifier training
+ Machine learning classifiers are effective against current crowd-workers+ Classifiers are highly vulnerable to adversarial attacks. Future works will focus on improving the robustiness of ML-classifiers
Example: Poisoning Attack+ Inject mislabeled samples to training data wrong classifier e.g., inject benign accounts as “workers” in training data + Uniformly change workers behavior by enforcing task policies hard to train an accurate classifier
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
Fals
e Po
sitiv
e Ra
te
Ratio of Injected Sample to Turfing
Tree
SVMpRF
SVMr
CustomerCrowdturfing Site Target Networks
Crowd-workers
1
10
100
1000
10000
100000
1000000
1
10
100
1000
10000
Jan. 2008 Jan. 2009 Jan. 2010 Jan. 2011
Cam
paig
ns p
er M
onth
Dolla
rs p
er M
onth
Site Growth Over Time
CampaignsCampaigns
$
$ZBJ
SDH
Model Training
Detection