using traffic analysis to detect email spamrnc1/talks/070522-detectspam.pdf->...
TRANSCRIPT
![Page 1: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/1.jpg)
Using Traffic Analysisto Detect Email Spam
Richard Clayton
7(5(1$��/\QJE\����QG�0D\�����
![Page 2: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/2.jpg)
Summary
• Log processing for customers
• Log processing for non-customers
• Looking at sampled sFlow data
![Page 3: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/3.jpg)
What problems do ISPs have?
3Insecure customers– very few real spammers sending directly !
• Botnets– compromised end-user machines
• SOCKS proxies &c– misconfiguration
• SMTP AUTH– Exchange “admin” accounts + many others
![Page 4: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/4.jpg)
ISP emailserver
(smarthost)
yahoo.comhotmail.comexample.comexample.co.ukbeispiel.deetc.etc.etc
customer
customer
customer
customer
ISPabuse@
team
spammer
spammer
Complaints
customer
customer
customer
customer
![Page 5: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/5.jpg)
ISP’s Real Problem
• Blacklisting of IP ranges & smarthosts– [email protected]
• Rapid action necessary to ensure continuedservice to all other customers
• But reports may go to the blacklist and notto the ISP (or will lack essential details)
![Page 6: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/6.jpg)
ISP emailserver
(smarthost)
yahoo.comhotmail.comexample.comexample.co.ukbeispiel.deetc.etc.etc
customer
customer
customer
customer
BLACKLIST
spammer
spammer
Complaints
customer
customer
customer
customer
![Page 7: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/7.jpg)
Spotting outgoing spam
• Expensive to examine outgoing content• Legal/contractual issues with blocking
– “false positives” could cost you customers• Volume is not a good indicator of spam
– many customers with occasional mailshots– daily limits only suitable for consumers
• “Incorrect” sender doesn’t indicate spam– many customers with multiple domains
![Page 8: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/8.jpg)
Key insight
• Lots of spam is to ancient email addresses• Lots of spam is to invented addresses• Lots of spam is blocked by remote filters
• Can process server logs to pick out thisinformation. Spam has delivery failureswhereas legitimate email mainly works
![Page 9: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/9.jpg)
ISP emailserver
(smarthost)
yahoo.comhotmail.comexample.comexample.co.ukbeispiel.deetc.etc.etc
customer
customer
customer
customer
ISPabuse@
team
spammer
spammer
Complaints
customer
customer
customer
customer
customer
customer
customer
customerLogs
![Page 10: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/10.jpg)
Log processing heuristics
3Report “too many” failures to deliver– more than 20 works pretty well
• Ignore “bounces” !– have null “< >” return path, these often fail– detect rejection daemons without < > paths
• Ignore “mailing lists”– most destinations work, only some fail (10%)– more than one mailing list is a spam indicator!
![Page 11: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/11.jpg)
Bonus! also detects viruses
• Common for mass mailing “worms” to useaddress book (mainly valid addresses)
• But remote sites may reject malwareAND VERY USEFUL!
• Virus authors don’t know how to say HELO• So virus infections are also detected
![Page 12: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/12.jpg)
Smarthost
ISP email handling
MX host
The Internet
![Page 13: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/13.jpg)
Heuristics for incoming email
• Simple heuristics on failures work really well– just as for smarthost
• Multiple HELO lines very common– often match MAIL FROM (to mislead)– may match RCPT TO (? authenticator ?)
• Look for outgoing email to the Internet• Pay attention to spam filter results
– but need to discount forwarding
![Page 14: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/14.jpg)
2007-05-19 10:47:15 [email protected] Size=2199 !!! [email protected] !!! [email protected] !!! [email protected] -> [email protected] -> [email protected] 10:50:22 [email protected] Size=2206 !!! [email protected] !!! [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected]
and 31 more valid destinations2007-05-19 10:59:15 [email protected] Size=2228 !!! [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] and 44 more valid destinations
![Page 15: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/15.jpg)
HELO = lrhnow.usa.net
2007-05-19 23:11:22 [email protected] Size= 8339 -> [email protected]
HELO = lkrw.hotmail.com
2007-05-19 23:11:24 [email protected] Size=11340 -> [email protected]
HELO = pshw.netscape.net
2007-05-19 23:14:52 [email protected] Size= 6122 -> [email protected]
HELO = zmgp.cs.com
2007-05-19 23:18:06 [email protected] Size= 6925 -> [email protected]
![Page 16: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/16.jpg)
Email log processing @ demon
Detection of spam (black) and viruses (red)
![Page 17: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/17.jpg)
Incoming reports (all sources)
spam (black), viruses (red), reports (blue)
![Page 18: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/18.jpg)
spamHINTS research project
The InternetLINX
LINX samples 1 in 2000 packets(using sFlow) and makes the port 25traffic available for analysis…
![Page 19: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/19.jpg)
Known “open server”
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
![Page 20: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/20.jpg)
Another known “open server”
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
![Page 21: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/21.jpg)
Look for excessive variation
• Look at number of hours active comparedwith number of four hour blocks active
• Use incoming email to Demon to pick outsenders of spam and hence annotate them asgood or bad…
• … did this for a large ISP, but problem isthat “if it sends, it’s bad”. Nevertheless…
![Page 22: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/22.jpg)
1
3
5
7
9
11
13
15
17
19
21
23
S1 S2 S3 S4 S5 S6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 23: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/23.jpg)
Spamminess vs hours of activityfor IPs active in 5 of 6 possible 4 hour periods
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
![Page 24: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/24.jpg)
So work continues…
• sFlow data will always be useful to feedback ongoing activity to abuse teams
• Analysis may improve when both ringsinstrumented and when data available inreal-time (so can compare historic data)
• Still to consider variations (and lack ofvariations) in destination as well as time
![Page 25: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/25.jpg)
Summary
• Processing outgoing server logs works well– keeps smarthosts out of blacklists
• Processing incoming server logs effective– some sites may see little “looped back” traffic
• Trying to processing sampled sFlow data– sampling is making it a real challenge– more work needed on good distinguishers
![Page 26: Using Traffic Analysis to Detect Email Spamrnc1/talks/070522-detectspam.pdf-> cy.tung@msa.hinet.net-> cy3219@hotmail.com-> cy_chiang@hotmail.com-> cyc.aa508@msa.hinet.net and 31 more](https://reader033.vdocuments.mx/reader033/viewer/2022060811/608f69866ded652c7f6aff77/html5/thumbnails/26.jpg)
http://www.cl.cam.ac.uk/~rnc1
CEAS papers: http://www.ceas.cc2004: Stopping spam by extrusion detection2005: Examining incoming server logs2006: Early results from spamHINTS2007: Email traffic: A qualitative snapshot