phishdef: url names say it all anh le, athina markopoulou university of california, irvine usa...

24
PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Upload: raymond-coulston

Post on 31-Mar-2015

225 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

PhishDef: URL Names Say It All

Anh Le, Athina Markopoulou

University of California, IrvineUSA

Michalis FaloutsosUniversity of California, Riverside

USA

Page 2: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

What is Phishing?

Anh Le - UC Irvine - PhishDef 2

• Social engineering and technical means to steal consumers’ personal identity, data, etc.

• Cause billions of dollars of loss annually

Page 3: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Anh Le - UC Irvine - PhishDef 3

Financial, 33.1%

Payment Services,

37.9%

Classifieds; 6.6%

Auction; 5.5%

Gaming; 4.6%

Retail/Service;

3.6%

Social Network-ing; 2.8%

Government; 1.3%

ISP; 1.2% Other; 3.4%

Most Targeted Industry Sectors 2nd Quarter ‘10

Antiphishing.org

Page 4: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Example of a Phishing Site

Anh Le - UC Irvine - PhishDef 4

Page 5: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Current Protection

Anh Le - UC Irvine - PhishDef 5

• Google Safe Browsing

• Microsoft Smart Screen

• Third-Party

Page 6: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Current Protection Model

Anh Le - UC Irvine - PhishDef 6

Motivation: Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing

Google Safe Browsing

Page 7: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Outline

o Phishing Background

o Motivation

o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results

o Concluding Remarks

Anh Le - UC Irvine - PhishDef 7

Page 8: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Our Proposed Protection Model

Anh Le - UC Irvine - PhishDef 8

• Main challenges: Accuracy and Classification Latency• Which classification algorithm works best?• Which set of features works best?

Page 9: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Prior Work

o Whittaker et al. [NDSS ’10]o Google Safe Browsing

o Ma et al. [SIGKDD ’09]o Batch-based Classification

o Ma et al. [ICML ‘09]o Batch-based vs. Online Learning

Anh Le - UC Irvine - PhishDef 9

Server-Side Classification

Page 10: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Main Contributions

o New Protection Model:o Client-side classification

o Propose using Adaptive Regularization of Weights (AROW)o High accuracyo Resilient to noise

o Set of Lexical Featureso Fast to extract at client sideo Obfuscation resistant

Anh Le - UC Irvine - PhishDef 10

Page 11: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

• Batch-based Support Vector Machine

• Online Perceptron

• Confident Weighted (CW) [Dredze et al., ICML 2008]

• Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009]

Machine Learning Algorithms

Anh Le - UC Irvine - PhishDef 11

Page 12: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Online Classification

Anh Le - UC Irvine - PhishDef 12

• Maintaining a weight vector and use it for classification

• Online Perceptron

Trained Beforehand Extract In Real Time

Client Side:

Server Side:

Page 13: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Online Classification

Anh Le - UC Irvine - PhishDef 13

• Confident Weighted (CW)

• Adaptive Regularization of Weights (AROW)

minimum change

enough to correct last mistake

minimum change

penalty for mistake increasing confidence

Page 14: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

o Phishing URLso PhishTank (4,082)o MalwarePatrol (2,001)

o Benign URLso Open directory (4,012)o Yahoo directory (4,143)

o Time period: June 2010

Dataset

Anh Le - UC Irvine - PhishDef 14

Page 15: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Feature Selection

Anh Le - UC Irvine - PhishDef 15

o Lexical Features

o External Featureso Country, AS number, registration date,

registrant, registrar, etc.

Page 16: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Outline

o Phishing Background

o Motivation

o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results

o Concluding Remarks

Anh Le - UC Irvine - PhishDef 16

Page 17: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Evaluation Results: Lexical vs. Full Features

Lexical features alone are better-suited than full features for client-side phishing classification

Anh Le - UC Irvine - PhishDef 17

(+) ~ 1%

(-) Dependency on Remote Server

(-) Avg. Latency: 1.64 s

Page 18: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Evaluation Results:CW vs. AROW

AROW is more resilient to noise than CW

Anh Le - UC Irvine - PhishDef 18

Page 19: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Conclusion: PhishDef

19Anh Le - UC Irvine - PhishDef

o Client-side phishing classification systemo Proactive, on-the-fly

classification of zero-day phishing URLs

o Low delay client side (ms),high accuracy (97%)

o Resilient to noisy data

o Future Work: o Develop an add-on for Firefox

Page 20: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

oQuestions

Anh Le - UC Irvine - PhishDef 20

Page 21: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Anh Le - UC Irvine - PhishDef 21

Page 22: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Example of a Phishing Site

22Anh Le - UC Irvine - PhishDef

http://www.hmrc.gov.uk/intro-income-tax.htm

http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm

Page 23: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Evaluation Results:Batch-Based vs. Online Learning

Online Learning outperforms Batched-Based Learningfor Phishing classificationAnh Le - UC Irvine - PhishDef 23

Page 24: PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

Chrome 11 > Firefox 4

24Anh Le - UC Irvine - PhishDef