bayesian filtering team glyph debbie bridygham pravesvuth uparanukraw ronald ko rihui luo thuong luu...

18
Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu

Upload: esmond-phelps

Post on 18-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Bayesian FilteringBayesian Filtering

Team Glyph

Debbie Bridygham Pravesvuth UparanukrawRonald Ko Rihui Luo Thuong Luu

Team Glyph

Debbie Bridygham Pravesvuth UparanukrawRonald Ko Rihui Luo Thuong Luu

Page 2: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

BackgroundBackground

•Strong need exists to identify “bad” items in a population and remove them -- Examples: SPAM, Unsolicited IMs, Etc.

•Filtering often results in “Arm’s Race” requiring rapid response

•“Arm’s Race” favors inherently adaptive methods over others

•Strong need exists to identify “bad” items in a population and remove them -- Examples: SPAM, Unsolicited IMs, Etc.

•Filtering often results in “Arm’s Race” requiring rapid response

•“Arm’s Race” favors inherently adaptive methods over others

Page 3: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Benefits of FiltersBenefits of Filters

•Less unwanted traffic, thus less wasted space on clients & servers

•Greater use of internet services due to reduced customer frustration

•Provide some protection against dangerous traffic: scams, phishing attacks, viruses, etc.

•Less unwanted traffic, thus less wasted space on clients & servers

•Greater use of internet services due to reduced customer frustration

•Provide some protection against dangerous traffic: scams, phishing attacks, viruses, etc.

Page 4: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Downsides of FilteringDownsides of Filtering

•Exclusion of even one legitimate item (i.e., False Positives) less desirable than letting 10 or more illegitimate items pass.

•Reducing the percentage of undesirable traffic often causes legitimate traffic to be excluded as well.

•Exclusion of even one legitimate item (i.e., False Positives) less desirable than letting 10 or more illegitimate items pass.

•Reducing the percentage of undesirable traffic often causes legitimate traffic to be excluded as well.

Page 5: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Cost of FilteringCost of Filtering

•Manual filtering has become prohibitive

•Maintenance of static filters costs time & money

•Time spent maintaining keywords or updating software delays response

•“Arm’s Race” often results in ever escalating costs

•Manual filtering has become prohibitive

•Maintenance of static filters costs time & money

•Time spent maintaining keywords or updating software delays response

•“Arm’s Race” often results in ever escalating costs

Page 6: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Methodologies Methodologies

•Manual filtering prohibitive in terms of time

•Static filtering based on heuristics and keywords does not adapt except via manual updates

•Bayesian filtering is dynamic, adapting with each new item scanned and/or marked

•Manual filtering prohibitive in terms of time

•Static filtering based on heuristics and keywords does not adapt except via manual updates

•Bayesian filtering is dynamic, adapting with each new item scanned and/or marked

Page 7: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

What is Bayesian Filtering?

What is Bayesian Filtering?

•Uses Naïve Bayes Classifier, which uses Bayes Theorem

•Classifier allows items to be adaptively categorized using probabilities & has low rate of False Positives

•Most well-known use in SPAM filtering; often credited to initial work by Paul Graham (“A Plan for Spam”) in 2002

•Uses Naïve Bayes Classifier, which uses Bayes Theorem

•Classifier allows items to be adaptively categorized using probabilities & has low rate of False Positives

•Most well-known use in SPAM filtering; often credited to initial work by Paul Graham (“A Plan for Spam”) in 2002

Page 8: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Naïve Bayes ClassifierNaïve Bayes Classifier

•Uses Bayes Theorem with assumptions that probabilities are independent (rarely true), thus “naïve”

•Classifier can start with initial assumptions, i.e., probabilities that words occur in legitimate or illegitimate messages

•Is trained over time and adapts. If final probability reaches some threshold, an item is rejected. Superior to keyword filtering.

•Uses Bayes Theorem with assumptions that probabilities are independent (rarely true), thus “naïve”

•Classifier can start with initial assumptions, i.e., probabilities that words occur in legitimate or illegitimate messages

•Is trained over time and adapts. If final probability reaches some threshold, an item is rejected. Superior to keyword filtering.

Page 9: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Bayes TheoremBayes Theorem

•First presented in 1763 based on work by mathematician Thomas Bayes

•Pr(A|B) = Pr(B|A)· Pr(A) / Pr(B)

•Specifies relationships between conditional probabilities

•Currently has practical use in many fields

•First presented in 1763 based on work by mathematician Thomas Bayes

•Pr(A|B) = Pr(B|A)· Pr(A) / Pr(B)

•Specifies relationships between conditional probabilities

•Currently has practical use in many fields

Page 10: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Bayesian Filtering Usage

Bayesian Filtering Usage

•Uses user input to develop individual statistics

•Probability matrix changes over time based on scanned messages and user decisions

•Matrix is used to calculate probability a message is unwanted

•Matrix adapts quickly to new input, resulting in surprisingly good results

•Uses user input to develop individual statistics

•Probability matrix changes over time based on scanned messages and user decisions

•Matrix is used to calculate probability a message is unwanted

•Matrix adapts quickly to new input, resulting in surprisingly good results

Page 11: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Example MatrixExample Matrix

Page 12: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

ExampleExample

•Suppose the word “guarantee” occurs in 500 of 2000 Spam emails, but only in 5 of 1000 Non-Spam emails

•The probability of Spam for this word is then (500 / 2000) / ((500 / 2000) + (5 / 1000)) = 0.98

•This probability is combined with that of others obtained from message to compute a probability for the entire message being Spam.

•Suppose the word “guarantee” occurs in 500 of 2000 Spam emails, but only in 5 of 1000 Non-Spam emails

•The probability of Spam for this word is then (500 / 2000) / ((500 / 2000) + (5 / 1000)) = 0.98

•This probability is combined with that of others obtained from message to compute a probability for the entire message being Spam.

Page 13: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Bayesian PoisoningBayesian Poisoning

•Attempts to fool BF systems by adding irrelevant words (often hidden)

•Type I attacks attempt to get messages through filter -- could be active or passive, with active producing feedback to sender via a “Web Bug” or other means

•Type II attacks attempt to cause “False Positives”, i.e., force desirable messages to be rejected

•Attempts to fool BF systems by adding irrelevant words (often hidden)

•Type I attacks attempt to get messages through filter -- could be active or passive, with active producing feedback to sender via a “Web Bug” or other means

•Type II attacks attempt to cause “False Positives”, i.e., force desirable messages to be rejected

Page 14: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Poisoning Effectiveness

Poisoning Effectiveness

•Passive attacks are rarely effective as filters are individual and sender gets no feedback

•Active attacks can be initially highly effective, if systems access “Web Bugs”

•All attacks lose effectiveness as the filter adjusts to incoming traffic

•Passive attacks are rarely effective as filters are individual and sender gets no feedback

•Active attacks can be initially highly effective, if systems access “Web Bugs”

•All attacks lose effectiveness as the filter adjusts to incoming traffic

Page 15: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

Products that use Bayesian FilteringProducts that use Bayesian Filtering

AlienCameAlienCamell

DSPAMDSPAMEudoraEudora

eXpurgateeXpurgateJunk-OutJunk-Out

MozillaMozillaPegasus Pegasus

MailMailPOPFilePOPFilePostiniPostini

SeaMonkeSeaMonkeyy

SpamAssaSpamAssassin ssin

SpamBayeSpamBayess

SpamProbSpamProbee

ThunderbirThunderbirdd

Page 16: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

SummarySummary

•BF adapts to individual needs

•BF is highly effective

•BF adapts more quickly than other solutions

•BF is resistant to “poisoning”

•BF adapts to individual needs

•BF is highly effective

•BF adapts more quickly than other solutions

•BF is resistant to “poisoning”

Page 17: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

ReferencesReferences

•[1] Sahami, M., et. al. “A Bayesian Approach to Filtering Junk E-Mail”, 1998

•[2] Graham, Paul. “A Plan for SPAM”, 2002

•[3] Graham-Cumming, John. “Does Bayesian poisoning exist?”, 2006

•[1] Sahami, M., et. al. “A Bayesian Approach to Filtering Junk E-Mail”, 1998

•[2] Graham, Paul. “A Plan for SPAM”, 2002

•[3] Graham-Cumming, John. “Does Bayesian poisoning exist?”, 2006

Page 18: Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw

References, cont.References, cont.

•[4] Naive Bayes Classifier, Wikipedia, 2007

•[5] Bayes Theorem, Wikipedia, 2007

•[4] Naive Bayes Classifier, Wikipedia, 2007

•[5] Bayes Theorem, Wikipedia, 2007