phishing spie 2012 presentation - jsw - d2

A Method for Automated Detection of Phishing Websites: Through Both Site Characteristics and Image Analysis

Joshua S. WhiteJeanna N. Matthews, PhD

Outline

• Problem• Method

– Image Analysis (in detail)• Method Verification• Results• Conclusion • References

• Phishing site detection– A largely manual process

•Requires human visual review of site to eliminate false positives / negatives

– URL's comes from actual phishing attempts•Email, and other user report URL's

– Analysis is responsive, not proactive

Problem

Method (Overview)

• For rapid proof of concept– Data collected using the 140Dev php

script and MySQL schema

• Page characteristics collected using PHP for DOM object parsing– Links, Images, Forms, Iframes, Meta

Tags

Method

• Collected using headless web-browser– CutyCapt, XVFB-RUN

• Hashing of resultant images– MD5Sum, SHA512, PHash

•Final choice was PHash (Perceptual Hash)– Uses descrete cosign transformation

» Reduces Sampling Frequency

• Hamming Distance used to compare each hash value

Image Analysis

Image Analysis

• Process:– Reduce the size of the image 32 x 32– Reduce the color to greyscale– Calculate the DCT (creates frequency scalars)– Reduce the DCT to 8 x 8 pixels– Second DCT reduction, set bits to 1 or 0 depending

on placement above or below average DCT– Take Hash

Image Analysis

Method Verification

• After our method was verified we concentrated on the top 5 most spoofed sites:

• Some False Characteristic Matches:

Results

• Phishing URL posting on social media networks is a growing problem

• We have developed a tool that quickly and effectively detects matches between legitimate and spoofed sites

• Future work includes:– Integration of our characteristic

mapping and image analysis technique into our social media analytics toolkit

Conclusion

Questions

?

References

phishing spie 2012 presentation - jsw - d2

Technology

problem method image

image analysis process

image analysis technique

method overview

user report urls analysis

dct reduction

average dct

r eferences