network-level spam filtering nick feamster georgia tech with anirudh ramachandran, shuang hao, maria...
TRANSCRIPT
![Page 1: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/1.jpg)
Network-Level Spam Filtering
Nick FeamsterGeorgia Tech
with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon Jung
![Page 2: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/2.jpg)
2
Spam: More than Just a Nuisance
• 95% of all email traffic– Image and PDF Spam
(PDF spam ~12%)
• As of August 2007, one in every 87 emails constituted a phishing attack
• Targeted attacks on the rise– 20k-30k unique phishing attacks per month
Source: CNET (January 2008), APWG
![Page 3: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/3.jpg)
3
Filtering
• Prevent unwanted traffic from reaching a user’s inbox by distinguishing spam from ham
• Question: What features best differentiate spam from legitimate mail?– Content-based filtering: What is in the mail?– IP address of sender: Who is the sender?– Behavioral features: How the mail is sent?
![Page 4: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/4.jpg)
Conventional Approach: Content Filters
• Trying to hit a moving target...
...and even mp3s!
PDFs Excel sheets Images
![Page 5: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/5.jpg)
5
Problems with Content Filtering
• Low cost to evasion: Spammers can easily alter features of an email’s content can be easily adjusted and changed
• Customized emails are easy to generate: Content-based filters need fuzzy hashes over content, etc.
• High cost to filter maintainers: Filters must be continually updated as content-changing techniques become more sophisticated
![Page 6: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/6.jpg)
6
Another Approach: IP Addresses
• Problem: IP addresses are ephemeral
• Every day, 10% of senders are from previously unseen IP addresses
• Possible causes– Dynamic addressing– New infections
![Page 7: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/7.jpg)
7
Problem: Addresses Keep ChangingF
ract
ion
of
IP A
dd
ress
es
About 10% of IP addresses never seen before in trace
![Page 8: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/8.jpg)
8
Key Idea: Network-Based Filtering
• Filter email based on how it is sent, in addition to simply what is sent.
• Network-level properties are less malleable– Set of target recipients– Hosting or upstream ISP (AS number)– Membership in a botnet (spammer, hosting
infrastructure)– Network location of sender and receiver
![Page 9: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/9.jpg)
9
Challenges (Talk Outline)• Understanding the network-level behavior
– What behaviors do spammers have?– How well do existing techniques work?
• Building classifiers using network-level features– Key challenge: Which features to use?– Two Algorithms: SpamTracker and SNARE
• Building the system – Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
![Page 10: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/10.jpg)
10
Understanding the Network-Level Behavior of Spammers
![Page 11: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/11.jpg)
11
Data: Spam and BGP• Spam Traps: Domains that receive only spam• BGP Monitors: Watch network-level reachability
Domain 1
Domain 2
17-Month Study: August 2004 to December 2005
![Page 12: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/12.jpg)
12
Data Collection: MailAvenger
• Highly configurable SMTP server• Collects many useful statistics
![Page 13: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/13.jpg)
13
BGP “Spectrum Agility”• Hijack IP address space using BGP• Send spam• Withdraw IP address
A small club of persistent players appears to be using
this technique.
Common short-lived prefixes and ASes
61.0.0.0/8 4678 66.0.0.0/8 2156282.0.0.0/8 8717
~ 10 minutes
Somewhere between 1-10% of all spam (some clearly intentional,
others might be flapping)
![Page 14: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/14.jpg)
14
Why Such Big Prefixes?
• Visibility: Route typically won’t be filtered (nice and short)
• Flexibility: Client IPs can be scattered throughout dark space within a large /8– Same sender usually returns with different IP
addresses
![Page 15: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/15.jpg)
15
Other Findings
• Top senders: Korea, China, Japan– Still about 40% of spam coming from U.S.
• More than half of sender IP addresses appear less than twice
• ~90% of spam sent to traps from Windows
![Page 16: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/16.jpg)
16
What about IP-based blacklists?
![Page 17: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/17.jpg)
17
Two Metrics
• Completeness: The fraction of spamming IP addresses that are listed in the blacklist
• Responsiveness: The time for the blacklist to list the IP address after the first occurrence of spam
![Page 18: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/18.jpg)
18
Completeness and Responsiveness
• 10-35% of spam is unlisted at the time of receipt• 8.5-20% of these IP addresses remain unlisted
even after one month
Data: Trap data from March 2007, Spamhaus from March and April 2007
![Page 19: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/19.jpg)
19
What’s Wrong with IP Blacklists?
• Based on ephemeral identifier (IP address)– More than 10% of all spam comes from IP addresses not seen
within the past two months• Dynamic renumbering of IP addresses• Stealing of IP addresses and IP address space• Compromised machines
• IP addresses of senders have considerable churn
• Often require a human to notice/validate the behavior– Spamming is compartmentalized by domain and not analyzed
across domains
![Page 20: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/20.jpg)
20
How to Fix This Problem?
• Option 1: Stronger sender identity– Stronger sender identity/authentication may make
reputation systems more effective– May require changes to hosts, routers, etc.
• Option 2: Filtering based on sender behavior– Can be done on today’s network– Identifying features may be tricky, and some may
require network-wide monitoring capabilities
![Page 21: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/21.jpg)
21
Outline
• Understanding the network-level behavior– What behaviors do spammers have?– How well do existing techniques work?
• Building classifiers using network-level features– Key challenge: Which features to use?– Algorithms: SpamTracker and SNARE
• Building the system (SpamSpotter)– Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
![Page 22: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/22.jpg)
22
SpamTracker
• Idea: Blacklist sending behavior (“Behavioral Blacklisting”)– Identify sending patterns commonly used by
spammers
• Intuition: Much more difficult for a spammer to change the technique by which mail is sent than it is to change the content
![Page 23: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/23.jpg)
23
SpamTracker Approach
• Construct a behavioral fingerprint for each sender
• Cluster senders with similar fingerprints
• Filter new senders that map to existing clusters
![Page 24: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/24.jpg)
24
SpamTracker: Identify Invariant
domain1.com domain2.com domain3.com
spam spam spam
IP Address: 76.17.114.xxxKnown Spammer
DHCPReassignment
Behavioral fingerprint
domain1.com domain2.com domain3.com
spam spam spam
IP Address: 24.99.146.xxxUnknown sender
Cluster on sending behavior
Similar fingerprint!
Cluster on sending behavior
Infection
![Page 25: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/25.jpg)
25
Building the Classifier: Clustering
• Feature: Distribution of email sending volumes across recipient domains
• Clustering Approach– Build initial seed list of bad IP addresses– For each IP address, compute feature vector:
volume per domain per time interval– Collapse into a single IP x domain matrix:– Compute clusters
![Page 26: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/26.jpg)
26
Clustering: Output and Fingerprint
• For each cluster, compute fingerprint vector:
• New IPs will be compared to this “fingerprint”
IP x IP Matrix: Intensity indicates pairwise similarity
![Page 27: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/27.jpg)
27
Evaluation
• Emulate the performance of a system that could observe sending patterns across many domains– Build clusters/train on given time interval
• Evaluate classification– Relative to labeled logs– Relative to IP addresses that were eventually listed
![Page 28: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/28.jpg)
28
Data
• 30 days of Postfix logs from email hosting service– Time, remote IP, receiving domain, accept/reject– Allows us to observe sending behavior over a large
number of domains– Problem: About 15% of accepted mail is also spam
• Creates problems with validating SpamTracker
• 30 days of SpamHaus database in the month following the Postfix logs– Allows us to determine whether SpamTracker detects
some sending IPs earlier than SpamHaus
![Page 29: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/29.jpg)
29
Classification ResultsHam
Spam
SpamTracker Score
Not always so accurate!
![Page 30: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/30.jpg)
30
Improving Classification
• Lower overhead• Faster detection• Better robustness (i.e., to evasion, dynamism)
• Use additional features and combine for more robust classification– Temporal: interarrival times, diurnal patterns– Spatial: sending patterns of groups of senders
![Page 31: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/31.jpg)
31
Outline
• Understanding the network-level behavior– What behaviors do spammers have?– How well do existing techniques work?
• Building classifiers using network-level features– Key challenge: Which features to use?– Two Algorithms: SpamTracker and SNARE
• Building the system – Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
![Page 32: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/32.jpg)
32
SNARE: Automated Sender Reputation
• Goal: Sender reputation from a single packet?(or at least as little information as possible)– Lower overhead– Faster classification– Less malleable
• Key challenge– What features satisfy these properties and can
distinguish spammers from legitimate senders
![Page 33: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/33.jpg)
33
Sender-Receiver Geodesic Distance
90% of legitimate messages travel 2,200 miles or less
![Page 34: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/34.jpg)
34
Density of Senders in IP Space
For spammers, k nearest senders are much closer in IP space
![Page 35: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/35.jpg)
35
Local Time of Day at Sender
Spammers “peak” at different local times of day
![Page 36: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/36.jpg)
36
Combining Features: RuleFit
• Put features into the RuleFit classifier• 10-fold cross validation on one day of query logs
from a large spam filtering appliance provider
• Using only network-level features• Completely automated
![Page 37: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/37.jpg)
37
Outline
• Understanding the network-level behavior– What behaviors do spammers have?– How well do existing techniques work?
• Building classifiers using network-level features– Key challenge: Which features to use?– Algorithms: SpamTracker and SNARE
• Building the system (SpamSpotter)– Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
![Page 38: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/38.jpg)
38
Deployment: Real-Time Blacklist
• As mail arrives, lookups received at BL
• Queries provide proxy for sending behavior
• Train based on received data
• Return score
Approach
![Page 39: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/39.jpg)
39
Challenges
• Scalability: How to collect and aggregate data, and form the signatures without imposing too much overhead?
• Dynamism: When to retrain the classifier, given that sender behavior changes?
• Reliability: How should the system be replicated to better defend against attack or failure?
• Evasion resistance: Can the system still detect spammers when they are actively trying to evade?
![Page 40: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/40.jpg)
40
Design Choice: Augment DNSBL• Expressive queries
– SpamHaus: $ dig 55.102.90.62.zen.spamhaus.org
• Ans: 127.0.0.3 (=> listed in exploits block list)– SpamSpotter: $ dig \
receiver_ip.receiver_domain.sender_ip.rbl.gtnoise.net
• e.g., dig 120.1.2.3.gmail.com.-.1.1.207.130.rbl.gtnoise.net
• Ans: 127.1.3.97 (SpamSpotter score = -3.97)
• Also a source of data– Unsupervised algorithms work with unlabeled
data
![Page 41: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/41.jpg)
41
Design Choice: Sampling
Relatively small samples can achieve low false positive rates
![Page 42: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/42.jpg)
42
Dynamism: Accuracy over Time
![Page 43: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/43.jpg)
43
Improvements
• Accuracy– Synthesizing multiple classifiers– Incorporating user feedback– Learning algorithms with bounded false positives
• Performance– Caching/Sharing– Streaming
• Security– Learning in adversarial environments
![Page 44: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/44.jpg)
44
Next Steps: Applications to Scams
• Scammers host Web sites on dynamic scam hosting infrastructure
• Use the DNS to redirect users to different sites when the location of the sites move
• State of the art: Blacklist URL
• Our approach: Blacklist based on network-level fingerprints
![Page 45: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/45.jpg)
45
Example: Time Between Record Changes
Fast-flux Domains tend to change much more frequently than legitimately hosted sites
![Page 46: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/46.jpg)
46
Summary: Network-Based Behavioral Filtering
• Spam increasing, spammers becoming agile– Content filters are falling behind– IP-Based blacklists are evadable
• Up to 30% of spam not listed in common blacklists at receipt. ~20% remains unlisted after a month
• Complementary approach: behavioral blacklisting based on network-level features– Blacklist based on how messages are sent– SpamTracker: Spectral clustering
• catches significant amounts faster than existing blacklists– SNARE: Automated sender reputation
• ~90% accuracy of existing with lightweight features– SpamSpotter: Putting it together in an RBL system
![Page 47: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/47.jpg)
47
References• Anirudh Ramachandran and Nick Feamster, “Understanding
the Network-Level Behavior of Spammers”, ACM SIGCOMM, 2006
• Anirudh Ramachandran, Nick Feamster, and Santosh Vempala, “Filtering Spam with Behavioral Blacklisting”, ACM CCS, 2007
• Nadeem Syed, Shuang Hao, Nick Feamster, Alex Gray and Sven Krasser, “SNARE: Spatio-temporal Network-level Automatic Reputation Engine”, GT-CSE-08-02
• Anirudh Ramachandran, Shuang Hao, Hitesh Khandelwal, Nick Feamster, Santosh Vempala, “A Dynamic Reputation Service for Spotting Spammers”, GT-CS-08-09
![Page 48: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/48.jpg)
48
![Page 49: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/49.jpg)
49
Classifying IP Addresses
• Given “new” IP address, build a feature vector based on its sending pattern across domains
• Compute the similarity of this sending pattern to that of each known spam cluster– Normalized dot product of the two feature vectors– Spam score is maximum similarity to any cluster
![Page 50: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/50.jpg)
50
Sampling: Training Time
![Page 51: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/51.jpg)
51
Additional History: Message Size Variance
Senders of legitimate mail have a much higher variance in sizes of messages they send
Message Size Range
Certain Spam
Likely Spam
Likely Ham
Certain Ham
Surprising: Including this feature (and others with more history) can actually decrease the accuracy of the classifier
![Page 52: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/52.jpg)
52
Completeness of IP Blacklists
~80% listed on average
~95% of bots listed in one or more blacklists
Number of DNSBLs listing this spammer
Only about half of the IPs spamming from short-lived BGP are listed in any blacklistF
ract
ion
of
all
spam
rec
eive
d
Spam from IP-agile senders tend to be listed in fewer blacklists
![Page 53: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/53.jpg)
53
Low Volume to Each Domain
Lifetime (seconds)
Am
ou
nt
of
Sp
am
Most spammers send very little spam, regardless of how long they have been spamming.
![Page 54: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/54.jpg)
54
Some Patterns of Sending are Invariant
domain1.com domain2.com domain3.com
spam spam spam
IP Address: 76.17.114.xxx
DHCPReassignment
domain1.com domain2.com domain3.com
spam spam spam
IP Address: 24.99.146.xxx
• Spammer's sending pattern has not changed• IP Blacklists cannot make this connection
![Page 55: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/55.jpg)
55
Characteristics of Agile Senders
• IP addresses are widely distributed across the /8 space
• IP addresses typically appear only once at our sinkhole
• Depending on which /8, 60-80% of these IP addresses were not reachable by traceroute when we spot-checked
• Some IP addresses were in allocated, albeit unannounced space
• Some AS paths associated with the routes contained reserved AS numbers
![Page 56: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/56.jpg)
56
Early Detection Results
• Compare SpamTracker scores on “accepted” mail to the SpamHaus database– About 15% of accepted mail was later determined to
be spam– Can SpamTracker catch this?
• Of 620 emails that were accepted, but sent from IPs that were blacklisted within one month– 65 emails had a score larger than 5 (85th percentile)
![Page 57: Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon](https://reader036.vdocuments.mx/reader036/viewer/2022070305/55149894550346b2598b56ea/html5/thumbnails/57.jpg)
57
Evasion
• Problem: Malicious senders could add noise– Solution: Use smaller number of trusted domains
• Problem: Malicious senders could change sending behavior to emulate “normal” senders– Need a more robust set of features…