spam: an analysis of spam filters

11
Spam: Spam: An Analysis of Spam An Analysis of Spam Filters Filters Joe Chiarella Joe Chiarella Jason O’Brien Jason O’Brien Advisors: Professor Wills and Professor Claypool

Upload: matthew-thornton

Post on 03-Jan-2016

51 views

Category:

Documents


6 download

DESCRIPTION

Spam: An Analysis of Spam Filters. Joe Chiarella Jason O’Brien. Advisors: Professor Wills and Professor Claypool. Project Goals. To analyze the effectiveness of different kinds of spam filters. Focused on SpamAssassin and Bogofilter. SpamAssassin. Rule-based filter – over 400 rules. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spam:  An Analysis of Spam Filters

Spam: Spam: An Analysis of Spam An Analysis of Spam FiltersFilters

Joe ChiarellaJoe Chiarella

Jason O’BrienJason O’Brien

Advisors: Professor Wills and Professor Claypool

Page 2: Spam:  An Analysis of Spam Filters

Project GoalsProject Goals

To analyze the effectiveness of To analyze the effectiveness of different kinds of spam filters.different kinds of spam filters.

Focused on SpamAssassin and Focused on SpamAssassin and BogofilterBogofilter

Page 3: Spam:  An Analysis of Spam Filters

SpamAssassinSpamAssassin

Rule-based filter – over 400 rules.Rule-based filter – over 400 rules. Each Rule has an associated Each Rule has an associated

weight.weight. Score of an email is sum of Score of an email is sum of

weights across all matching rules.weights across all matching rules. User adjustable threshold.User adjustable threshold.

Page 4: Spam:  An Analysis of Spam Filters

BogofilterBogofilter

Bayesian filter.Bayesian filter. Calculates probability that an Calculates probability that an

email is spam using past email.email is spam using past email. Looks at frequency of words (not Looks at frequency of words (not

order of words).order of words). Accuracy should improve over Accuracy should improve over

time.time.

Page 5: Spam:  An Analysis of Spam Filters

Data CollectionData Collection

Email collected from students, Email collected from students, professors, small business professors, small business employees, and free email employees, and free email accounts.accounts.

4626 ham emails, 5010 spam 4626 ham emails, 5010 spam emails, separated into ham and emails, separated into ham and spam mailboxes for each user.spam mailboxes for each user.

Page 6: Spam:  An Analysis of Spam Filters

MethodologyMethodology

Compared accuracy of Compared accuracy of SpamAssassin and Bogofilter for SpamAssassin and Bogofilter for each user’s email.each user’s email.

Tested same number of ham Tested same number of ham emails and spam emails from emails and spam emails from each user.each user.

Ignored results from first 50 Ignored results from first 50 emails to allow Bogofilter to learn.emails to allow Bogofilter to learn.

Page 7: Spam:  An Analysis of Spam Filters

Comparison of Bogofilter Comparison of Bogofilter and SpamAssassin on and SpamAssassin on HamHam

CP = Company Person

PR = Professor

ST = Student

FE = Free Email

Page 8: Spam:  An Analysis of Spam Filters

Comparison of Bogofilter Comparison of Bogofilter and SpamAssassin on and SpamAssassin on SpamSpam

CP = Company Person

PR = Professor

ST = Student

FE = Free Email

Page 9: Spam:  An Analysis of Spam Filters

SpamAssassin Score SpamAssassin Score AnalysisAnalysis

Page 10: Spam:  An Analysis of Spam Filters

ConclusionConclusion

Bogofilter and SpamAssassin Bogofilter and SpamAssassin effectiveness depend greatly on effectiveness depend greatly on the user.the user.

Neither filter outperformed the Neither filter outperformed the other in all cases.other in all cases.

Filtering Spam is hard.Filtering Spam is hard.

Page 11: Spam:  An Analysis of Spam Filters

Questions?Questions?