phishing spie 2012 presentation - jsw - d2
DESCRIPTION
TRANSCRIPT
A Method for Automated Detection of Phishing Websites: Through Both Site Characteristics and Image Analysis
Joshua S. WhiteJeanna N. Matthews, PhD
Outline
• Problem• Method
– Image Analysis (in detail)• Method Verification• Results• Conclusion • References
• Phishing site detection– A largely manual process
•Requires human visual review of site to eliminate false positives / negatives
– URL's comes from actual phishing attempts•Email, and other user report URL's
– Analysis is responsive, not proactive
Problem
Method (Overview)
• For rapid proof of concept– Data collected using the 140Dev php
script and MySQL schema
• Page characteristics collected using PHP for DOM object parsing– Links, Images, Forms, Iframes, Meta
Tags
Method
• Collected using headless web-browser– CutyCapt, XVFB-RUN
• Hashing of resultant images– MD5Sum, SHA512, PHash
•Final choice was PHash (Perceptual Hash)– Uses descrete cosign transformation
» Reduces Sampling Frequency
• Hamming Distance used to compare each hash value
Image Analysis
Image Analysis
• Process:– Reduce the size of the image 32 x 32– Reduce the color to greyscale– Calculate the DCT (creates frequency scalars)– Reduce the DCT to 8 x 8 pixels– Second DCT reduction, set bits to 1 or 0 depending
on placement above or below average DCT– Take Hash
Image Analysis
Method Verification
• After our method was verified we concentrated on the top 5 most spoofed sites:
• Some False Characteristic Matches:
Results
• Phishing URL posting on social media networks is a growing problem
• We have developed a tool that quickly and effectively detects matches between legitimate and spoofed sites
• Future work includes:– Integration of our characteristic
mapping and image analysis technique into our social media analytics toolkit
Conclusion
Questions
?
References
References