understanding, characterizing, and detecting facebook like farms
Post on 16-Apr-2017
192 Views
Preview:
TRANSCRIPT
Understanding, Characterizing, and Detecting
Facebook Like Farms
Cambridge, 22 March 2016
Emiliano De Cristofarohttps://emilianodc.com
2
Facebook and Ads1 billion users, $3 billion ad revenue
Brands create “page” to engage customers
FC Barcelona (90M+ likes), Shakira (100M+)40M+ small businesses with active pages, 2M of them use ads to promote
Value of a “like” highly debatable...From $214.81 (Blackbaud) to $8 (ChompOn)http://valueofalike.com/
3
How can I get “likes”?Facebook page ads
Cost-per-click or cost-per-impressionVariable price and amount
Like Farms, including:boostlikes.comsocialformula.comauthenticlikes.commammothsocials.com
Reports that “farmers” like a lot of other pages too...
4
Calls for a measurement study...Created 13 Facebook honeypot pages
“Virtual Electricity” Description: “this is not a real page, please do not like it”
Two promotion methods1. Like farms2. Facebook ads
Anonymized data collection, ethical approval, …
E. De Cristofaro, A. Friedman, G. Jourjon, M.A. Kaafar, M. Zubair Shafiq. Paying for Likes? Understanding Facebook Like Fraud Using Honeypots. ACM IMC 2014.
5
Provider Location Budget Duration #Likes
1 Facebook USA $6/day 15 days 322 Facebook France $6/day 15 days 443 Facebook India $6/day 15 days 5184 Facebook Egypt $6/day 15 days 6915 Facebook Worldwide $6/day 15 days 4846 BoostLikes Worldwide $70 15 days -7 BoostLikes USA $190 15 days 6248 SocialFormula Worldwide $14 3 days 9849 SocialFormula USA $70 3 days 738
10 AuthenticLikes Worldwide $50 3-5 days 75511 AuthenticLikes USA $60 3-5 days 103812 MammothSoci
als Worldwide $20 - -
13 MammothSocials USA $95 5 days 317
6
Temporal AnalysisSocialFormula campaign acquires likes in a short time window
Bot-operated (“lock-step” behavior)
BoostLikes campaign acquires likes gradually
Manual process or deliberately slow to avoid suspicion
7
Location AnalysisSocialFormulalikes from Turkey
AuthenticLikes spread out across many countries
8
Social Graph AnalysisAuthenticLikes and MammothSocials have some common users
BoostLikes likers are well-connected
9
Like AnalysisLike farm profiles like a lot of pages (median 1-2K)
Exception: BoostLikes worldwide campaign
Facebook campaign likers also like a lot of pages (median 800-1200)
10
Like AnalysisLikers tend to like similar pages
Many likers like popular pagesFootball starsMobile phonesTech companies
11
Two Main Modi OperandiSome farms seem to be operated by bots and do not try to hide
Bursts of activity, few friends
Some are stealthier Mimic real behavior, well-connected network structure
12
Facebook DetectionRevisited liker accounts after 1 month
Account termination reasons by user or Facebook
A small fraction of liker accounts terminated
Provider Location #Likes #Closed
1 Facebook USA 32 02 Facebook France 44 03 Facebook India 518 24 Facebook Egypt 691 65 Facebook Worldwide 484 36 BL Worldwide - -7 BL USA 624 18 SF Worldwide 984 119 SF USA 738 9
10 AL Worldwide 755 811 AL USA 1038 3612 MS Worldwide - -13 MS USA 317 9
13
Detecting Fake Likes?Temporal burst of likes
[WWW’13, KDD’14] – CopyCatch algorithm
Cluster based on similar actions[CCS’14] – SynchroTrap algorithm
Like distributions (spatial)[USENIX’14] – PCA anomaly detection
Facebook essentially applies graph co-clustering fraud detection (SynchroTrap, CopyCatch)
14
Efficacy of Co-Clustering?Pretty sure it works well on non-stealthy, but what about the stealthy farms?
Let’s measure this stuff up (again!)Well, first we need to re-crawl
M. Ikram, L. Onwuzurike, S. Farooqi, E. De Cristofaro, A. Friedman, G. Jourjon, M.A. Kaafar, M. Zubair Shafiq. Combating Fraud in Online Social Networks: Characterizing and Detecting Facebook Like Farms. Available from http://arxiv.org/abs/1506.00506
15
New DatasetCampaign #Users #Pages
Liked #Unique #Posts
BL-USA 583 79,025 37,283 44,566
SF-ALL 870 879,369 108,020 46,394
SF-USA 653 340, 964 75,404 38,999
AL-ALL 707 162,686 46,230 61,575
AL-USA 827 441,187 141,214 30,715
MS-USA 259 412,258 141,262 12,280
Baseline 1,408 79,247 57,384 34,903
16
Co-Clustering ResultsCampaign TP FP TN FN Precision Recall F1
AL-USA 681 9 569 4 98% 99% 99%AL-ALL 448 53 527 1 89% 99% 94%BL-USA 523 588 18 0 47% 100% 64%SF-USA 428 67 512 1 86% 100% 94%SF-ALL 431 48 530 2 90% 99% 95%MS-USA 201 22 549 2 90% 99% 93%
18
But... BL-USA
19
What if we use lexical features?Posts and comments from timelines
Term frequency-inverse document frequency (TF-IDF)Using TF-IDF features, train SVM
Not the best!Campaign Total
Users Training
Set Testing
Set TP FP TN FN Precision Recall Accuracy F1
AL-USA 827 661 204 103 9 229 101 92% 50% 75% 65%AL-ALL 707 566 141 101 1 237 40 99% 72% 89% 83%BL-USA 583 468 115 78 1 237 37 99% 68% 89% 80%SF-USA 652 522 130 83 0 238 47 100% 89% 73% 84%SF-ALL 870 697 173 128 3 235 45 88% 98% 74% 84%MS-USA 259 210 49 32 5 233 17 86% 65% 92% 74%
20
How can we characterize timeline features?Idea: let’s look at users’ interaction with posts and their lexical features
First, we look at types of posts on timelines
21
#comments per post
22
#likes per post
23
#shared posts
24
#words per post
25
Lexical AnalysisCampaign Avg
#CharsAvg
#WordsAvg
#Sentences
AvgSentence
Length
AvgWord
LengthRichness ARI Flesch
Score
Baseline 4,477 780 67 6.9 17.6 0.7 20.2 55.1
AL-ALL 2,835 464 32 6.2 13.9 0.59 14.8 43.6
AL-USA 2,475 394 33 6.2 12.7 0.49 14.1 54
BL-USA 7,356 1,330 63 5.7 22.8 0.58 16.9 51.5
MS-USA 6,227 1,047 66 6.1 17.8 0.53 16.2 50.1
SF-ALL 1,438 227 19 6.3 11.7 0.58 14.1 45.2
SF-USA 1,637 259 22 6.3 12 0.55 14.4 45.6
26
27
ConclusionLike farms are widespread
Moderate fraud impact in isolation, but unclear how much fraudsters mess with advertising platform
Some are easy to spot, some less
Future/ongoing work:1. Reputation Manipulation2. Measuring Page Engagement3. Understanding Farm Ecosystem
top related