enhancing scalability in anomaly-based email spam filtering - ceas 2011

50
Anomaly-based Spam filtering arlos Laorden

Upload: carlos-laorden

Post on 18-Dec-2014

72 views

Category:

Technology


0 download

DESCRIPTION

Presentation at CEAS 2011 International conference of the paper: Enhancing Scalability in Anomaly-based Email Spam Filtering

TRANSCRIPT

Page 1: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Anomaly-basedSpam filtering

Carlos Laorden

Page 2: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Billions of daily losses in

productivity

Page 3: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Infected computers

Page 4: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Stolen credentials

Page 5: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Food?NO!

Page 6: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Monty Python’s Flying Circus

Page 7: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

WHAT YOU GOT, THEN? SPAM, EGG,

SPAM, SPAM, BACON AND

SPAM.

SPAM, SPAM, SPAM, BAKED BEANS AND

SPAM.

ANYTHING WITHOUT

SPAM?

I DON’T LIKE

SPAM!!

UGH!

Page 8: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Something that repeats and repeats until being annoying

Page 9: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

It is a

real problemfor Information Security

Page 10: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 11: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

We must

fight

Page 12: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Anti-spam methods

Pre-sending

Newprotocols

Post-sending

Increase sendingcosts

Increase risksfor spammers

E-mailsender

E-mailcontentE-mailcontent

Page 13: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Usually

supervisedapproaches

Page 14: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

A significant

labelling workis needed

Page 15: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

A significant

labelling workis needed

Page 16: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

But,is this

possible?

Page 17: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

I mean,is this

possible...

Page 18: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

...without

loosing

accuracy

drastically?

Page 19: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

YES

Page 20: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Anomaly Detection

Page 21: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

no interest this SpamAssassin word has

this has Ling Spam no interest word

SpamAssassin

Ling Spamt1

t2

t3D1

D2

D10

D3

D9

D4

D7

D8

D5

D11

D6

Page 22: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

??

Anomaly detection

d

d> threshold?

> threshold?

Page 23: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Manhattan distance

Euclidean distance

Page 24: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Anomaly detection

?

d

d?

Page 25: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Minimum distance

Maximum distance

Mean distance

Page 26: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Minimumdistance

Maximumdistance

Meandistance

Manhattandistance

Euclideandistance

Page 27: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

10different

thresholds

Page 28: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Anomaly detection

d

d < threshold

> threshold

Page 29: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

min

max

Page 30: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Minimumdistance

Maximumdistance

Meandistance

Manhattandistance

Euclideandistance

10thresholds

Page 31: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

d

?

d

dd

Page 32: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

HighProcessingOverhead

Page 33: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

1. Representation of the emails

2. Anomaly Detection

1.5 Data clustering

Page 34: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

QT clustering

algorithm

Page 35: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Minimumdistance

Maximumdistance

Meandistance

Manhattandistance

Euclideandistance

10thresholds

QT1.501.752.00

Page 36: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Results

Page 37: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Ling SpamDistance measure Quality Threshold % Average reduction

Ecuclidean

1.501.752.00

13.21%57.10%89.72%99.94%

Manhattan

1.501.752.00

33.75%46.78%62.47%99.94%

SpamAssassin

Ecuclidean

1.501.752.00

89.78%97.63%99.34%99.96%

Manhattan

1.501.752.00

93.59%96.81%98.57%99.96%

Page 38: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 39: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 40: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

SpamAssassin

Detects more than 95%of junk emails

Less than 5% of

misclassified legitimate emails

Page 41: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Ling Spam

Detects more than 95%of junk emails

An improvable 10% of

misclassified legitimate emails

Page 42: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

SpamAssassin

Previous work Clustering Reduction

Euclidean 93.99% 94.39%

Manhattan 96.50% 95.37%

Ling SpamPrevious work Clustering Reduction

Euclidean 95.02% 95.54%

Manhattan 83.85% 89.60%

Page 43: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Suitable to

overcome the amountof unclassified spam e-mails

Page 44: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 45: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 46: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

Will we seethe END of spam?

Page 47: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 48: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 49: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Page 50: Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011

References1. Monty Python – Spam: http://www.youtube.com/watch?

v=anwy2MPT5RE2. Spam wall by freezelight:

http://www.flickr.com/photos/63056612@N00/155554663/3. monty python flying circus by the_d8_show:

http://www.flickr.com/photos/8056839@N04/478599790/4. Dollars: http://vegasgravy.com/News-detail/two-women-caught-

for-transporting-drug-money-from-vegas/dollars/5. Day 97: Infected by dustywrath:

http://www.flickr.com/photos/10921499@N07/21873186836. my bank sucks by B Rosen:

http://www.flickr.com/photos/rosengrant/3537904106/7. Computer spam:

http://novapublicidad.com.ec/dataexpress/wp-content/uploads/2013/03/computerSpam.jpg

8. Star cluster: http://s3.amazonaws.com/img.tnt/f11e/0b22stock.jpg

9. Feet on table: http://bisystembuilders.com/wp-content/uploads/2010/02/shutterstock_feet-on-table.jpg

10. Buried on bills: http://getupkids.net/wp-content/uploads/2013/06/debt_piling.jpg

11. Kill spam: http://www.email-marketing-wizard.com/wp-content/uploads/2010/03/spammer.jpg