detecting phishing in emails
Post on 21-Jan-2016
88 Views
Preview:
DESCRIPTION
TRANSCRIPT
Detecting Phishing in Emails
Srikanth Palla
Ram Dantu
University of North Texas, Denton
What is Phishing?
Phishing is a form of online identity theft Employs both social engineering and technical subterfuge Targets consumers' personal identity data and financial account
credentials such as credit card numbers, account usernames, passwords and social security numbers.
Social-engineering schemes use 'spoofed' e-mails to lead consumers to counterfeit websites.
-Anti Phishing Working Group
(APWG)
Phishing Tactics
Hijacking reputable brand names
Creating a plausible premise Redirecting URL’s Collecting confidential
information through emails
Do we need to restrict Phishing attacks?
The Statistics…
Sources: Anti Phishing Working Group
Problems with Current Spam Filtering Techniques
Current spam filters focus on analyzing the content
Majority of the Phishers obfuscate their email content to bypass the email filters
Labels an email as BULK and expect the recipients’ to make a decision on the authenticity of the email source
Current spam filters have high degree of false positives
Methodology
Our method examines: The header of the email (not content) The social network of the recipient Credibility of the source Classifies Phishers as:
Prospective Phishers Recent Phishers Suspects Serial Phishers
Traffic Profile
The following Figure describes the incoming email traffic profiles based on number of recipients and how often they receive the message.
LEGITIMATE
Number of Recipients in an enterprise
ANNOYANCE/COUNTERFIET/ NUISANCE
PERSONALCLUB INVITATIONS
NEWS GROUPS
BUSINESS DISCUSSIONS
STRANGERS
OPTIONAL
PRODUCTIVITY GAIN PRODUCTIVITY LOSS
DISCUSSIONTHREADS
INDIVIDUALDISCUSSIONS
PROFESSIONAL/BUSINESS
ANNOUNCEMENTS
GOOD NEWS
PROFESSIONALDISCUSSIONS
TELEMARKETING
PHISHING
Fre
quency o
f em
ails
arr
ivin
g
Email Corpus Traffic Profile
Our analysis requires sent email folder of the recipient
Emails provided in the TREC evaluation tool kit are spam and non spam emails
We require a mix of legitimate and phising emails to evaluate our filter
We have analyzed a live corpus of
13,843 emails, collected over 2.5 years. This corpus has a mix of legitimate, spam and phishing emails. Different categories of emails are shown in the figure
Experimental Setup
We deployed our classifier on a recipient’s local machine running an IMAP proxy and thunderbird (MUA).
All the recipient’s emails were fed directly into our classifier by the proxy.
Our classifier periodically scans the user’s mailbox files for any new incoming emails.
DNS-based header analysis, social network analysis, wantedness analysis were performed on each of the emails.
The end result is tagging of emails as either Phishing, Opt-outs, Socially distinct and Socially close.
Architecture
The architecture model of our
classifier consists of three analyses
Step 1: DNS-based header analysis
Step 2: Social network analysis Step 3: Wantedness analysis Step 4: Classification
DNS-based Header Analysis
Social networkAnalysis
ClassificationBased on
WantednessAnd
Credibility
Mail Box Phishing
Opt-outs
socially close
socially distinct
User Feed back
User Feed back
wantednessAnalysis
Step 1: DNS-based Header Analysis
Stage 1: In this step, we validate the information provided in the email header: the hostname position of the sender, the mail server and the relays in the rest of the path. We divide the entire corpus into two buckets. The emails which are valid for DNS lookups (Bucket 1). The emails which are not valid for DNS lookups (Bucket 2).
Stage 2: This step involves doing DNS lookup on the hostname provided in the Received: lines of the header and matching the IP address returned, with the IP address which is stored next to the hostname, by the relays during the SMTP authorization process. Bucket 1 is further divided into: Trusted bucket. Untrusted bucket.
We pass the Bucket2 and both trusted and untrusted buckets to the Social Network Analysis phase for further analysis.
Step 2: Social Network Analysis
Each of the three buckets: bucket2, untrusted bucket and trusted bucket received from the DNS-based header analysis are treated with the rules formulated by analyzing the “sent” folder emails of the receiver.
For instance,All emails from trusted domains will be removedFamiliarity to sender’s community Familiarity to the path traversed
The rules can be built as per the recipients’ email filtering preferences.
Classification of Trusted and Untrusted Senders
Email corpusSize: 13843
DNS lookupValid
Size:13087
DNS lookupInvalid
Size:1875
SociallyUntrusted
SociallyTrusted
SociallyUntrusted
SociallyTrusted
UntrustedEmails
Size: 563
TrustedEmails
Size: 13280
Phishers
Opt-outs
SociallyWanted
SociallyUnwanted
Step 3: Wantedness Analysis
Measuring the senders credibility (ρ):
We believe the credibility of a sender depends on the nature of his recent emails
If the recent emails sent by the sender are legitimate, his credibility increases
If the recent emails from the sender are fraudulent, his fraudulency increases
Credibility Drops As Time Progresses for
Untrusted Senders
ΔTeji,ρni,ρ
Computing Credibility
.(4)............................................................ρ1 ρ̂ yFraudulenc
.....(3)....................emailslegitimateΔTemailsfraudulentΔT
τ̂
τρ yCredibilit
2..................................emailsfraudulentΔT
1τ̂Disbelief
1..........................................emailslegitimateΔT
1τBelief
(ΔT legitimate emails) is the average time period of all legitimate email w.r.t the most recent email
(ΔT fraudulent emails) is the average time period of all fraudulent emails w.r.t the most recent email
Credibility of Untrusted Senders
0 20 40 60 80 100 1200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Cre
dib
ility
Valu
e
O1O2
O3
Threshold
Phishers
Optouts
Low Credible Domains eg: www.ebay.com, www.paypal.cometc
High CredibleDomains
Measuring Recipient’s Wantedness
Tolerance (α+) for a sender is more if the recipient reads and stores his emails for longer period
Intolerance (β-) for a sender is more if the recipient deletes his emails with out reading them
Measuring Wantedness
R
χ1 R γssUnwantedne
emailslegitimateΔt
emailsfraudulentΔt
urdTrdT
β
α
βeIntoleranc
αToleranceRχ Wantedness
emailsfraudulentΔt
1β eIntoleranc sRecipient'
urdT β eIntoleranc sRecipient'
emailslegitimateΔt
1α Tolerance sRecipient'
rdT α Tolerance sRecipient'
(ΔT legitimate emails) is the average time period of all legitimate email w.r.t the most recent email
(ΔT fraudulent emails) is the average time period of all fraudulent emails w.r.t the most recent emailTrd is the average storage time period of all the read emailsTurd is the average storage time period of all unread emails
Wantedness of Trusted Senders
Classification
Classification of Phishers: Credibility Vs Phishing Frequency
Classification of Trusted Senders: Credibility Vs Wantedness
Classification of Phishers
0 15 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Phishing Frequency
Fra
udule
ncy
Prospective PhishersHigh RiskSuspectsPhishers Under Review
RecentPhishers
Suspects
High Risk
Prospective Phishers
Classification of Trusted Senders
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wantedness
Cre
dibi
lity
Spammers, Phishers, Telemarketers Socially DistinctOpt-InsFamily, Friends etc
High Risk Strangers
Socially CloseOpt-Ins
Summary of Results
# of emails False Positives False Negatives Precision
Corpus-I
DNS Analysis 11968 260 0 85%
{[DNS Analysis] + [Social Network Analysis]}
2548 03 05 95.6%
{[DNS Analysis] + [Social Network Analysis]+ [Wantedness Analysis]}
563 (Domains) 03 01 98.4%
Corpus-II
DNS Analysis 756 5 0 90.4%
{[DNS Analysis] + [Social Network Analysis]}
59 0 0 93.75%
{[DNS Analysis] + [Social Network Analysis]+ [Wantedness Analysis]}
148 1 0 99.2%
Precision is the percentage of messages that were classified as phishing that actually are phishing
Conclusions
Phishers use special software's to conceal the path taken by their emails to reach the recipient. Most of the times the path length is single hop.
Our classifier can be used in conjunction with any existing spam filtering techniques for restricting spam and phishing emails
Rather than labeling an email as BULK, based on the sender’s credibility and his wantedness, we further classify them as: Prospective phishers
Suspects
Recent phishers
Serial phishers
We classified two different email corpuses with a precision of 98.4% and 99.2% respectively
top related