spam: an analysis of spam filters
DESCRIPTION
Spam: An Analysis of Spam Filters. Joe Chiarella Jason O’Brien. Advisors: Professor Wills and Professor Claypool. Project Goals. To analyze the effectiveness of different kinds of spam filters. Focused on SpamAssassin and Bogofilter. SpamAssassin. Rule-based filter – over 400 rules. - PowerPoint PPT PresentationTRANSCRIPT
Spam: Spam: An Analysis of Spam An Analysis of Spam FiltersFilters
Joe ChiarellaJoe Chiarella
Jason O’BrienJason O’Brien
Advisors: Professor Wills and Professor Claypool
Project GoalsProject Goals
To analyze the effectiveness of To analyze the effectiveness of different kinds of spam filters.different kinds of spam filters.
Focused on SpamAssassin and Focused on SpamAssassin and BogofilterBogofilter
SpamAssassinSpamAssassin
Rule-based filter – over 400 rules.Rule-based filter – over 400 rules. Each Rule has an associated Each Rule has an associated
weight.weight. Score of an email is sum of Score of an email is sum of
weights across all matching rules.weights across all matching rules. User adjustable threshold.User adjustable threshold.
BogofilterBogofilter
Bayesian filter.Bayesian filter. Calculates probability that an Calculates probability that an
email is spam using past email.email is spam using past email. Looks at frequency of words (not Looks at frequency of words (not
order of words).order of words). Accuracy should improve over Accuracy should improve over
time.time.
Data CollectionData Collection
Email collected from students, Email collected from students, professors, small business professors, small business employees, and free email employees, and free email accounts.accounts.
4626 ham emails, 5010 spam 4626 ham emails, 5010 spam emails, separated into ham and emails, separated into ham and spam mailboxes for each user.spam mailboxes for each user.
MethodologyMethodology
Compared accuracy of Compared accuracy of SpamAssassin and Bogofilter for SpamAssassin and Bogofilter for each user’s email.each user’s email.
Tested same number of ham Tested same number of ham emails and spam emails from emails and spam emails from each user.each user.
Ignored results from first 50 Ignored results from first 50 emails to allow Bogofilter to learn.emails to allow Bogofilter to learn.
Comparison of Bogofilter Comparison of Bogofilter and SpamAssassin on and SpamAssassin on HamHam
CP = Company Person
PR = Professor
ST = Student
FE = Free Email
Comparison of Bogofilter Comparison of Bogofilter and SpamAssassin on and SpamAssassin on SpamSpam
CP = Company Person
PR = Professor
ST = Student
FE = Free Email
SpamAssassin Score SpamAssassin Score AnalysisAnalysis
ConclusionConclusion
Bogofilter and SpamAssassin Bogofilter and SpamAssassin effectiveness depend greatly on effectiveness depend greatly on the user.the user.
Neither filter outperformed the Neither filter outperformed the other in all cases.other in all cases.
Filtering Spam is hard.Filtering Spam is hard.
Questions?Questions?