email address harvesting

Post on 08-Sep-2014

95 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

HP Tech Forum 2009 presentation covering some of the ways spammers harvest email addresses on the Internet (and how you can prevent it), including an in-depth look at three commonly used software packages.

TRANSCRIPT

Produced in cooperation with: HP Technology Forum & Expo 2009

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

Email Address Harvesting Michael Lamont

Senior Software Engineer

June 17, 2009

Overview

• What is email address harvesting?

• How do spammers do it?

• What can you do about it?

• Examples of harvesting software

Mandatory Definition Slide

• Email address harvesting is the process used by spammers to extract email addresses from public sources.

• Common sources:

− Web sites

− Newsgroups

− Mailing lists

− Chat rooms

Mandatory “How Bad Is It?” Slide

• FTC: 86% of all email addresses posted on web pages receive spam.

• FTC: 93% of all email addresses used in newsgroups receive spam.

• PSC honeypot record: Address received spam 4 minutes after being included in a newsgroup post.

Address Lists

• Spammers use address harvesting to build giant lists of addresses to send spam to.

• Most lists have 1-20 million addresses.

• Spammers sell/share their lists, so being on even just one list will get you a lot of spam.

Evolution Of The Address List

• Somebody (probably not even a spammer) harvests addresses from various sources.

• A “good” harvester scrubs the list.

• The harvester sells the list to lots of spammers.

• Once your address is on a list, it’s going to be on one or more lists forever.

Harvesting From Web Sites

• Spammers usually use a spider program to scrape addresses off of web pages.

Harvesting From Web Sites

Harvesting From Web Sites

• Web directories make it easy to get lots of addresses

Harvesting From Web Sites

10 22 July 2014

UseNet Newsgroups

• Spider programs exist to extract these addresses as well.

• Email addresses are splattered all over:

− Message headers

− Signatures

− Attributions

Mailing Lists

• Lots of list manager software provides a list of every email address on a list.

• Spammers are happy to join a mailing list temporarily to get access to a list of subscribers.

• Some clever spammers send an innocuous newbie question from the list archives with a read-receipt request.

3rd Party Mailing Lists

• People you’ve provided your address to provide it to 3rd parties (usually for profit).

• Example: Auto insurance quote

• Initial sale of list might be aboveboard, but lists have a way of trickling down to less desirable senders.

Web Browser Holes

• Newer browsers have eliminated most of these, but they’re still common in older browsers.

• Extraction of email address from HTTP_FROM header that browser sends to web server.

• JavaScript to extract email address from browser’s configuration.

Web Browser Holes

• Force browser to fetch an image on a page by anonymous FTP.

− Most browsers use the configured email address as the password.

• JavaScript action that sends an email message in the background on page load.

Chat Rooms

• Web bots monitor chat rooms and extract user names.

• Lots of providers (AOL, Yahoo) use the same profile names for both chat rooms and email.

• IRC used to be fertile harvesting ground, but it’s fallen into disuse by less savvy users.

Domain Contacts

• Every registered domain name has one or more contact addresses.

• Addresses are publicly accessible (WHOIS)

• Addresses are almost always valid and read by a real person on a regular basis.

Guessing

• Spammers “guess together” a list of email addresses.

• The addresses are tested against one or more email servers.

• Valid addresses are added to a list of addresses to be spammed.

• Usually referred to as directory harvesting.

CAN-SPAM

• Federal CAN-SPAM act explicitly makes email address harvesting illegal.

• Some providers of the harvesting software have ceased and desisted, but harvesting has actually increased.

• Like most legal solutions, CAN-SPAM is severely constrained by jurisdictional boundaries.

Harvesting Prevention

• The harder it is for spammers to get your address, the harder it is for them to spam you.

• “I don’t care – my spam filter is awesome. Bring it on!”

• No filter is 100% accurate

• Filtering still places load on filtering system and/or email server.

Prevention Methods

• Reformatting addresses

• Web forms

• JavaScript-generated mailto links

• Graphical addresses

• Throwaway addresses

Reformatting Addresses

• Prevents harvesting from web pages and newsgroups.

• Simple examples include inserting bogus strings into the address to make it invalid:

jdoe@NOSPAM.hp.com

jdoeREMOVEME@hp.com

Reformatting Addresses

• Writing the address out longhand can prevent harvesters from recognizing it as an email address:

jdoe at hp dot com

• Inserting extra whitespace can also help:

jdoe @ hp.com

jdoe @ hp.com

Reformatting Addresses

• ASCII-encoded characters in the address are decoded by most web clients, but not by most spamware:

jdoe@p&#

114;ocess&#

046;com

Web Forms

• Provide an HTML form for web site visitors to enter a message.

• When the form is submitted, the CGI script mails the message to the appropriate recipient.

• Avoids displaying the actual address anywhere on the site.

• Can still be abused, but it’s relatively difficult to do.

Web Forms

JavaScript Generated mailtos

• Use JavaScript to dynamically generate mailto: link when the link is clicked.

<A HREF=„javascript:window.location=

“mail”+”to:”+”jdoe”+”@”+”hp”+”.”+”com”; return

true‟>Click here to mail John Doe</A>

Graphical Addresses

• Displaying all or part of an email address as a graphical image will throw off most harvesting software.

• No known harvesting software is OCR-capable.

− Anecdotal reports of at least one large spam organization trying to develop accurate OCR harvesters

Graphical Address Complexity

• Graphical @ sign:

− Probably sufficient to throw off most harvesters.

− Username and hostname are still in close proximity.

− Works easily for multiple users/multiple domains.

jdoe hp.com

Graphical Address Complexity

• Graphical @hostname:

− Should prevent any harvester from working.

− Requires a different image for each email domain.

jdoe

Graphical Address Complexity

• Graphical everything:

− For the truly paranoid.

− Completely unreadable by harvesters unless they’re OCR-enabled.

− Requires either a lot of images or a script that can dynamically generate them.

Throwaway Addresses

• Many people create an email account that they use only for web pages and newsgroups.

• Some software products go further and let you create an alias for every occasion.

• You still need a static address for business cards, resumes, etc.

Harvesting Software

• Tons of specialized software (spamware) used by spammers to harvest addresses.

• Most spamware developed in Eastern Europe and Asia.

• We’re going to look at several of the most popular packages.

List Harvester

• Harvests addresses from web sites.

• “Targeted” harvesting - in theory, the harvested email addresses have something in common.

• Appears to be based in China.

• http://www.listharvester.com

• Price: $699 US

List Harvester - Method

• Performs a search for one or more keywords on the user’s choice of search engine.

• Parses every site returned by the search engine in order, looking for addresses and links.

• Follows links to other pages and parses them for addresses as well.

List Harvester

• Start screen:

List Harvester

• Search terms entry:

List Harvester

• Search parameters:

List Harvester

• Search filters:

List Harvester

• Parsing engine options:

List Harvester

• Saving list of extracted addresses:

List Harvester

• Harvesting in progress:

Atomic Email Hunter

• Harvests addresses from web sites.

• Either scans an entire web site for addresses or performs a “targeted search” like List Harvester.

• Based in Russia, most likely Moscow.

• http://www.massmailsoftware.com/

• Price: $79.85 US

Atomic Email Hunter

• Start screen:

Atomic Email Hunter

• Web download settings:

Atomic Email Hunter

• Address filtering settings:

Atomic Email Hunter

Run:

Atomic Email Hunter

• Results:

Fast Newsgroups Extractor

• Harvests addresses from newsgroups.

• Has a companion web site extractor that’s very similar to Atomic Email Hunter.

• Based in Russia, most likely Moscow.

• http://www.lencom.com

• Price: $79.00 US

Fast Newsgroups Extractor - Method

• Lets user select one or more newsgroups to extract content from.

• Downloads multiple messages simultaneously from the NNTP server.

• Extracts addresses from the downloaded messages.

• Has the ability to limit downloaded messages to those that contain certain text in the subject.

Fast Newsgroups Extractor

• Start screen:

Fast Newsgroups Extractor

• News server setup:

Fast Newsgroups Extractor

• Newsgroup list download:

Fast Newsgroups Extractor

• News group selection:

Fast Newsgroups Extractor

• Harvesting job setup

Fast Newsgroups Extractor

• Run:

Quick Review

• We talked about:

− What email address harvesting is

− What data sources are harvested

− How you can protect your addresses

− 3 software packages used by spammers to harvest addresses

58 22 July 2014

top related