bit.ly/malicious: deep dive into short url based e-crime detection

22
Unifying the Global Response to Cybercrime bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection Neha Gupta, Anupama Aggarwal, Ponnurangam Kumaraguru IIIT-Delhi, India

Upload: precog

Post on 15-Dec-2014

157 views

Category:

Engineering


3 download

DESCRIPTION

Existence of spam URLs over emails and Online Social Media (OSM) has become a massive e-crime. To counter the dissemination of long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening has gained a lot of traction. URL shorteners take as input a long URL and output a short URL with the same landing page (as in the long URL) in return. With their immense popularity over time, URL shorteners have become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service among all shortening services is being exploited heavily to carry out phishing attacks, work- from-home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset of 763,160 short URLs marked suspicious by Bitly in the month of October 2013. Our results reveal that Bitly is not using its claimed spam detection services very effectively. We also show how a suspicious Bitly account goes unnoticed despite of a prolonged recurrent illegitimate activity. Bitly dis- plays a warning page on identification of suspicious links, but we observed this approach to be weak in controlling the overall propagation of spam. We also identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious or benign and achieved an accuracy of 86.41%. The feature set identified can be generalized to other URL shortening services as well. To the best of our knowledge, this is the first large scale study to highlight the issues with the implementation of Bitly’s spam detection policies and proposing suitable countermeasures.

TRANSCRIPT

Page 1: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Neha Gupta, Anupama Aggarwal, Ponnurangam Kumaraguru

IIIT-Delhi, India

Page 2: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Presentation Outline

!  Problem !  Contribution !  Dataset !  Results !  Conclusions & Future Work

2

Page 3: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

What are URL shortening services? Long URL Short URL

Others

URL shortening service

!  Shortens ~80 million links/day !  2-3 million suspicious/week

3

Page 4: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Abuse URL

shortening service

One-level obfuscation

Long malicious URL

Short malicious URL

Not so popular

URL shortening

service

Long malicious URL

Short malicious URL

Popular URL

shortening service

Multi-level obfuscation

is.gd bit.ly

4

Page 5: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Major attacks Year 2012

Year 2014

Year 2013

Year 2014

5

Page 6: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Bitly's Spam Detection Policies

+

+ More filters..

+

‘‘ ’’

‘‘ ’’

6

Page 7: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Research Contribution

!  Impact analysis of malicious Bitly links on OSM

!  Identification of issues in Bitly’s spam detection

!  Machine learning classification to detect malicious Bitly URLs

7

Page 8: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Dataset link_encoder_info

link_encoder_link_history

link_info

link_expand

link_clicks

link_referring_domains

link_encoders

Bitly Global Hash

Long URL

#Warnings

Link Dataset (763,160)

Link Metric Dataset (413,119)

Encoder/User Metric Dataset

(12,344)

Phase 1 Phase 2 Phase 3

(54.13%) (100%)

8

Page 9: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Domains !  83.06% suspicious domains non-existent after 5 months

!  Click requests (October 2013): 9,937,250

!  Created for spamming and die after achieving significant hits

9

Page 10: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

63.54% 17.69% 18.77%

5,375 users

Network

Why more Twitter than Facebook? !  Doesn't allow users to connect

Facebook brand / fan pages for free

Multiple connections !  507 malicious users connected

multiple Twitter accounts !  28 malicious users connected at

least 10 Twitter accounts

Connected OSM network of all encoders

10

Page 11: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Network Bitly profiles

(Link history)

Bitly warning check

(Connected Twitter accounts)

(<=200 tweets)

Twitter profile Jaccard Similarity

(Bitly user name)

Bitly profile Jaccard Similarity

Manual annotation based on similarity scores

3 malicious communities detected

11

Page 12: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

!  2 Bitly users with 9 Twitter accounts each !  Similar explicit pornographic content !  Dormant on Bitly, active on Twitter

Network

12

Page 13: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

(a) Malicious link detection

!  APWG: 86% undetected !  Virustotal: 71.53% undetected !  SURBL: 36.66% undetected (Bitly claims to use SURBL)

Efficiency

(b) Malicious user profile detection

13

Page 14: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

2,018 /12,344 encoders (16.35%) had a Suspicion Factor=1 ; shortened only suspicious links

Efficiency

12,344

10,326

14

Page 15: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles

Promptness Analysis

15

User: bamsesang, Month lag: 24

Page 16: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Bitly’s response

16

Page 17: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Malicious Bitly Link Detection

Tweets  from  

Twi,er’s  REST  API    (412,139)  

Blacklist + Bitly Warning Check

Extract  and  expand  bitly  URLs  (34,802)  

Malicious  

Benign  

labeled-dataset

unlabeled-dataset

Collect data

1.  Google  Safebrowsing  2.  SURBL  3.  PhishTank  4.  VirusTotal  

Data Collection Data Labeling

Data Collection and Labeling

17

Page 18: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Feature Selection No. Feature Name Feature Description

1 Domain age Difference between domain creation / updation date and expiration date

2 Link Creation domain creation difference

Difference between domain creation date and bitly link creation date

3 Link creation hour Bitly link creation hour

4 Number of encoders

Number of bitly users who encoded a particular link

5 Anonymous and API encoder ratio

Ratio of encoders as ‘’anonymous’’ or from a Twitter based application (Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders

6 Link creation first click difference

Difference in days between bitly link creation date and date of first click received

7 Referring domains - direct by total

Ratio of referring domains from a direct source to the total number of referring domains

WH

OIS

spe

cific

Bitly sp

ec

ific

No

n-C

lick

ba

sed

C

lick b

ase

d

18

Page 19: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Evaluation Results Experiment 1

Mix dataset – Click and Non-click All features

Precision (random forest): 81.20%

Experiment 2

Only Non-click data WHOIS + Non-click based features

Precision (random forest): 89.60%

TP

FP

FN

TN TP

FP FN

TN

19

Page 20: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Feature Ranks

Rank Feature

1 Type of referring domains

2 Link Creation domain creation difference

3 Domain age

4 Link creation hour

5 Type of encoders

6 Link creation-click lag

7 Number of encoders

Rank Feature

1 Link creation hour

2 Link Creation domain creation difference

3 Domain age

4 Type of encoders

5 Number of encoders

Experiment 1 Experiment 2

20

Page 21: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Conclusion & Future Work !  Restricted FB/Twitter connections per profile !  Credibility score per profile

!  Bitly specific features in addition to blacklists

!  Temporal pattern !  Broaden / generalize features for other URL shorteners !  Browser extension

21

Page 22: bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Unifying the Global Response to Cybercrime

Questions?

22

Thanks to Bitly -Brian David Eoff -Mark Josephson

Thank You! [email protected]