bit.ly/malicious: deep dive into short url based e-crime detection
DESCRIPTION
Existence of spam URLs over emails and Online Social Media (OSM) has become a massive e-crime. To counter the dissemination of long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening has gained a lot of traction. URL shorteners take as input a long URL and output a short URL with the same landing page (as in the long URL) in return. With their immense popularity over time, URL shorteners have become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service among all shortening services is being exploited heavily to carry out phishing attacks, work- from-home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset of 763,160 short URLs marked suspicious by Bitly in the month of October 2013. Our results reveal that Bitly is not using its claimed spam detection services very effectively. We also show how a suspicious Bitly account goes unnoticed despite of a prolonged recurrent illegitimate activity. Bitly dis- plays a warning page on identification of suspicious links, but we observed this approach to be weak in controlling the overall propagation of spam. We also identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious or benign and achieved an accuracy of 86.41%. The feature set identified can be generalized to other URL shortening services as well. To the best of our knowledge, this is the first large scale study to highlight the issues with the implementation of Bitly’s spam detection policies and proposing suitable countermeasures.TRANSCRIPT
Unifying the Global Response to Cybercrime
bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection
Neha Gupta, Anupama Aggarwal, Ponnurangam Kumaraguru
IIIT-Delhi, India
Unifying the Global Response to Cybercrime
Presentation Outline
! Problem ! Contribution ! Dataset ! Results ! Conclusions & Future Work
2
Unifying the Global Response to Cybercrime
What are URL shortening services? Long URL Short URL
…
Others
URL shortening service
! Shortens ~80 million links/day ! 2-3 million suspicious/week
3
Unifying the Global Response to Cybercrime
Abuse URL
shortening service
One-level obfuscation
Long malicious URL
Short malicious URL
Not so popular
URL shortening
service
Long malicious URL
Short malicious URL
Popular URL
shortening service
Multi-level obfuscation
…
is.gd bit.ly
4
Unifying the Global Response to Cybercrime
Major attacks Year 2012
Year 2014
Year 2013
Year 2014
5
Unifying the Global Response to Cybercrime
Bitly's Spam Detection Policies
+
+ More filters..
+
‘‘ ’’
‘‘ ’’
6
Unifying the Global Response to Cybercrime
Research Contribution
! Impact analysis of malicious Bitly links on OSM
! Identification of issues in Bitly’s spam detection
! Machine learning classification to detect malicious Bitly URLs
7
Unifying the Global Response to Cybercrime
Dataset link_encoder_info
link_encoder_link_history
link_info
link_expand
link_clicks
link_referring_domains
link_encoders
Bitly Global Hash
Long URL
#Warnings
Link Dataset (763,160)
Link Metric Dataset (413,119)
Encoder/User Metric Dataset
(12,344)
Phase 1 Phase 2 Phase 3
(54.13%) (100%)
8
Unifying the Global Response to Cybercrime
Domains ! 83.06% suspicious domains non-existent after 5 months
! Click requests (October 2013): 9,937,250
! Created for spamming and die after achieving significant hits
9
Unifying the Global Response to Cybercrime
63.54% 17.69% 18.77%
5,375 users
Network
Why more Twitter than Facebook? ! Doesn't allow users to connect
Facebook brand / fan pages for free
Multiple connections ! 507 malicious users connected
multiple Twitter accounts ! 28 malicious users connected at
least 10 Twitter accounts
Connected OSM network of all encoders
10
Unifying the Global Response to Cybercrime
Network Bitly profiles
(Link history)
Bitly warning check
(Connected Twitter accounts)
(<=200 tweets)
Twitter profile Jaccard Similarity
(Bitly user name)
Bitly profile Jaccard Similarity
Manual annotation based on similarity scores
3 malicious communities detected
11
Unifying the Global Response to Cybercrime
! 2 Bitly users with 9 Twitter accounts each ! Similar explicit pornographic content ! Dormant on Bitly, active on Twitter
Network
12
Unifying the Global Response to Cybercrime
(a) Malicious link detection
! APWG: 86% undetected ! Virustotal: 71.53% undetected ! SURBL: 36.66% undetected (Bitly claims to use SURBL)
Efficiency
(b) Malicious user profile detection
13
Unifying the Global Response to Cybercrime
2,018 /12,344 encoders (16.35%) had a Suspicion Factor=1 ; shortened only suspicious links
Efficiency
12,344
10,326
14
Unifying the Global Response to Cybercrime
Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles
Promptness Analysis
15
User: bamsesang, Month lag: 24
Unifying the Global Response to Cybercrime
Bitly’s response
16
Unifying the Global Response to Cybercrime
Malicious Bitly Link Detection
Tweets from
Twi,er’s REST API (412,139)
Blacklist + Bitly Warning Check
Extract and expand bitly URLs (34,802)
Malicious
Benign
labeled-dataset
unlabeled-dataset
Collect data
1. Google Safebrowsing 2. SURBL 3. PhishTank 4. VirusTotal
Data Collection Data Labeling
Data Collection and Labeling
17
Unifying the Global Response to Cybercrime
Feature Selection No. Feature Name Feature Description
1 Domain age Difference between domain creation / updation date and expiration date
2 Link Creation domain creation difference
Difference between domain creation date and bitly link creation date
3 Link creation hour Bitly link creation hour
4 Number of encoders
Number of bitly users who encoded a particular link
5 Anonymous and API encoder ratio
Ratio of encoders as ‘’anonymous’’ or from a Twitter based application (Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders
6 Link creation first click difference
Difference in days between bitly link creation date and date of first click received
7 Referring domains - direct by total
Ratio of referring domains from a direct source to the total number of referring domains
WH
OIS
spe
cific
Bitly sp
ec
ific
No
n-C
lick
ba
sed
C
lick b
ase
d
18
Unifying the Global Response to Cybercrime
Evaluation Results Experiment 1
Mix dataset – Click and Non-click All features
Precision (random forest): 81.20%
Experiment 2
Only Non-click data WHOIS + Non-click based features
Precision (random forest): 89.60%
TP
FP
FN
TN TP
FP FN
TN
19
Unifying the Global Response to Cybercrime
Feature Ranks
Rank Feature
1 Type of referring domains
2 Link Creation domain creation difference
3 Domain age
4 Link creation hour
5 Type of encoders
6 Link creation-click lag
7 Number of encoders
Rank Feature
1 Link creation hour
2 Link Creation domain creation difference
3 Domain age
4 Type of encoders
5 Number of encoders
Experiment 1 Experiment 2
20
Unifying the Global Response to Cybercrime
Conclusion & Future Work ! Restricted FB/Twitter connections per profile ! Credibility score per profile
! Bitly specific features in addition to blacklists
! Temporal pattern ! Broaden / generalize features for other URL shorteners ! Browser extension
21
Unifying the Global Response to Cybercrime
Questions?
22
Thanks to Bitly -Brian David Eoff -Mark Josephson
Thank You! [email protected]