enhancing scalability in anomaly-based email spam filtering - ceas 2011

Post on 18-Dec-2014

72 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation at CEAS 2011 International conference of the paper: Enhancing Scalability in Anomaly-based Email Spam Filtering

TRANSCRIPT

Anomaly-basedSpam filtering

Carlos Laorden

Billions of daily losses in

productivity

Infected computers

Stolen credentials

Food?NO!

Monty Python’s Flying Circus

WHAT YOU GOT, THEN? SPAM, EGG,

SPAM, SPAM, BACON AND

SPAM.

SPAM, SPAM, SPAM, BAKED BEANS AND

SPAM.

ANYTHING WITHOUT

SPAM?

I DON’T LIKE

SPAM!!

UGH!

Something that repeats and repeats until being annoying

It is a

real problemfor Information Security

We must

fight

Anti-spam methods

Pre-sending

Newprotocols

Post-sending

Increase sendingcosts

Increase risksfor spammers

E-mailsender

E-mailcontentE-mailcontent

Usually

supervisedapproaches

A significant

labelling workis needed

A significant

labelling workis needed

But,is this

possible?

I mean,is this

possible...

...without

loosing

accuracy

drastically?

YES

Anomaly Detection

no interest this SpamAssassin word has

this has Ling Spam no interest word

SpamAssassin

Ling Spamt1

t2

t3D1

D2

D10

D3

D9

D4

D7

D8

D5

D11

D6

??

Anomaly detection

d

d> threshold?

> threshold?

Manhattan distance

Euclidean distance

Anomaly detection

?

d

d?

Minimum distance

Maximum distance

Mean distance

Minimumdistance

Maximumdistance

Meandistance

Manhattandistance

Euclideandistance

10different

thresholds

Anomaly detection

d

d < threshold

> threshold

min

max

Minimumdistance

Maximumdistance

Meandistance

Manhattandistance

Euclideandistance

10thresholds

d

?

d

dd

HighProcessingOverhead

1. Representation of the emails

2. Anomaly Detection

1.5 Data clustering

QT clustering

algorithm

Minimumdistance

Maximumdistance

Meandistance

Manhattandistance

Euclideandistance

10thresholds

QT1.501.752.00

Results

Ling SpamDistance measure Quality Threshold % Average reduction

Ecuclidean

1.501.752.00

13.21%57.10%89.72%99.94%

Manhattan

1.501.752.00

33.75%46.78%62.47%99.94%

SpamAssassin

Ecuclidean

1.501.752.00

89.78%97.63%99.34%99.96%

Manhattan

1.501.752.00

93.59%96.81%98.57%99.96%

SpamAssassin

Detects more than 95%of junk emails

Less than 5% of

misclassified legitimate emails

Ling Spam

Detects more than 95%of junk emails

An improvable 10% of

misclassified legitimate emails

SpamAssassin

Previous work Clustering Reduction

Euclidean 93.99% 94.39%

Manhattan 96.50% 95.37%

Ling SpamPrevious work Clustering Reduction

Euclidean 95.02% 95.54%

Manhattan 83.85% 89.60%

Suitable to

overcome the amountof unclassified spam e-mails

Will we seethe END of spam?

References1. Monty Python – Spam: http://www.youtube.com/watch?

v=anwy2MPT5RE2. Spam wall by freezelight:

http://www.flickr.com/photos/63056612@N00/155554663/3. monty python flying circus by the_d8_show:

http://www.flickr.com/photos/8056839@N04/478599790/4. Dollars: http://vegasgravy.com/News-detail/two-women-caught-

for-transporting-drug-money-from-vegas/dollars/5. Day 97: Infected by dustywrath:

http://www.flickr.com/photos/10921499@N07/21873186836. my bank sucks by B Rosen:

http://www.flickr.com/photos/rosengrant/3537904106/7. Computer spam:

http://novapublicidad.com.ec/dataexpress/wp-content/uploads/2013/03/computerSpam.jpg

8. Star cluster: http://s3.amazonaws.com/img.tnt/f11e/0b22stock.jpg

9. Feet on table: http://bisystembuilders.com/wp-content/uploads/2010/02/shutterstock_feet-on-table.jpg

10. Buried on bills: http://getupkids.net/wp-content/uploads/2013/06/debt_piling.jpg

11. Kill spam: http://www.email-marketing-wizard.com/wp-content/uploads/2010/03/spammer.jpg

top related