bit.ly/malicious: deep dive into short url based e-crime detection

Unifying the Global Response to Cybercrime

bit.ly/malicious: Deep Dive into Short URL based e-Crime Detection

Neha Gupta, Anupama Aggarwal, Ponnurangam Kumaraguru

IIIT-Delhi, India


Presentation Outline

!  Problem !  Contribution !  Dataset !  Results !  Conclusions & Future Work

2


What are URL shortening services? Long URL Short URL

…

Others

URL shortening service

!  Shortens ~80 million links/day !  2-3 million suspicious/week

3


Abuse URL

shortening service

One-level obfuscation

Long malicious URL

Short malicious URL

Not so popular

URL shortening

service

Long malicious URL

Short malicious URL

Popular URL

shortening service

Multi-level obfuscation

…

is.gd bit.ly

4


Major attacks Year 2012

Year 2014

Year 2013

Year 2014

5


Bitly's Spam Detection Policies

+

+ More filters..

+

‘‘ ’’

‘‘ ’’

6


Research Contribution

!  Impact analysis of malicious Bitly links on OSM

!  Identification of issues in Bitly’s spam detection

!  Machine learning classification to detect malicious Bitly URLs

7


Dataset link_encoder_info

link_encoder_link_history

link_info

link_expand

link_clicks

link_referring_domains

link_encoders

Bitly Global Hash

Long URL

#Warnings

Link Dataset (763,160)

Link Metric Dataset (413,119)

Encoder/User Metric Dataset

(12,344)

Phase 1 Phase 2 Phase 3

(54.13%) (100%)

8


Domains !  83.06% suspicious domains non-existent after 5 months

!  Click requests (October 2013): 9,937,250

!  Created for spamming and die after achieving significant hits

9


63.54% 17.69% 18.77%

5,375 users

Network

Why more Twitter than Facebook? !  Doesn't allow users to connect

Facebook brand / fan pages for free

Multiple connections !  507 malicious users connected

multiple Twitter accounts !  28 malicious users connected at

least 10 Twitter accounts

Connected OSM network of all encoders

10


Network Bitly profiles

(Link history)

Bitly warning check

(Connected Twitter accounts)

(<=200 tweets)

Twitter profile Jaccard Similarity

(Bitly user name)

Bitly profile Jaccard Similarity

Manual annotation based on similarity scores

3 malicious communities detected

11


!  2 Bitly users with 9 Twitter accounts each !  Similar explicit pornographic content !  Dormant on Bitly, active on Twitter

Network

12


(a) Malicious link detection

!  APWG: 86% undetected !  Virustotal: 71.53% undetected !  SURBL: 36.66% undetected (Bitly claims to use SURBL)

Efficiency

(b) Malicious user profile detection

13


2,018 /12,344 encoders (16.35%) had a Suspicion Factor=1 ; shortened only suspicious links

Efficiency

12,344

10,326

14


Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles

Promptness Analysis

15

User: bamsesang, Month lag: 24


Bitly’s response

16


Malicious Bitly Link Detection

Tweets from

Twi,er’s REST API (412,139)

Blacklist + Bitly Warning Check

Extract and expand bitly URLs (34,802)

Malicious

Benign

labeled-dataset

unlabeled-dataset

Collect data

1. Google Safebrowsing 2. SURBL 3. PhishTank 4. VirusTotal

Data Collection Data Labeling

Data Collection and Labeling

17


Feature Selection No. Feature Name Feature Description

1 Domain age Difference between domain creation / updation date and expiration date

2 Link Creation domain creation difference

Difference between domain creation date and bitly link creation date

3 Link creation hour Bitly link creation hour

4 Number of encoders

Number of bitly users who encoded a particular link

5 Anonymous and API encoder ratio

Ratio of encoders as ‘’anonymous’’ or from a Twitter based application (Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders

6 Link creation first click difference

Difference in days between bitly link creation date and date of first click received

7 Referring domains - direct by total

Ratio of referring domains from a direct source to the total number of referring domains

WH

OIS

spe

cific

Bitly sp

ec

ific

No

n-C

lick

ba

sed

C

lick b

ase

d

18


Evaluation Results Experiment 1

Mix dataset – Click and Non-click All features

Precision (random forest): 81.20%

Experiment 2

Only Non-click data WHOIS + Non-click based features

Precision (random forest): 89.60%

TP

FP

FN

TN TP

FP FN

TN

19


Feature Ranks

Rank Feature

1 Type of referring domains


3 Domain age

4 Link creation hour

5 Type of encoders

6 Link creation-click lag


Rank Feature

1 Link creation hour


3 Domain age

4 Type of encoders


Experiment 1 Experiment 2

20


Conclusion & Future Work !  Restricted FB/Twitter connections per profile !  Credibility score per profile

!  Bitly specific features in addition to blacklists

!  Temporal pattern !  Broaden / generalize features for other URL shorteners !  Browser extension

21


Questions?

22

Thanks to Bitly -Brian David Eoff -Mark Josephson

Thank You! [email protected]

bit.ly/malicious: deep dive into short url based e-crime detection

Engineering