icsi honeyfarm status

The ICSI Honeyfarm Cui, Paxson, Weaver

ICSI Honeyfarm Status

Weidong CuiVern Paxson

Nicholas Weaver


General Concept:A Breeding Ground for Worms

• We want a controlled, automatic breeding ground for worms and other self-propagating attacks:– Worm attacks a "monitored" address and begins to propagate in our

system• As the worm propagates, we have a suite of automatic analyzers to

study the worm– What it can infect?– Any particulars of interest?– How does it attack?

• And automatically analyze defense strategies– Does this signature block the worm?

• All within a very short time: a few seconds– And with a single point of trust for exporting information

• Also want to leverage the infrastructure for detecting other things:– Human attackers/non-self-propagating attacks– Non-random worms


Honeyfarm: Objectives• We use network telescopes and a honeyfarm to detect scanning worms

– Network Telescopes• Distributed unallocated IP address ranges

– Honeyfarm:• Centralized cluster of honeypots• On-demand: emulating a large number of hosts on a small number of honeypots

• Detecting self-propagation– Detect self-propagation inside the honeyfarm by redirecting propagations from one honeypot to other honeypots

• Other detectors possible:– Tripwire/modification detectors– Monitored honeypots, etc…

Global Internet

Honeypots

Controller

Honeyfarm


The Overall Goal• Framework for automatically detecting and analyzing new worms and

other attacks– For self-propagating attacks, we want to generate:

• Vulnerability signatures: What is vulnerable• Behavior signatures: What the worm needs to propagate• Attack signatures: Signatures which detect and block the attack• All signatures should be verified for effectiveness

– For non-self-propagating attacks, as much of the above as possible– Based on providing a fertile ground for constrained propagation

• Receive data from multiple sources– Small distributed telescopes, Large telescopes– Spam, Crawling?

• For a RANDOM worm, with k addresses, V victims, and M systems infected:– Pdetect = 1 – ((V-k)/V)M after M machines infected– High probability of detection when M = V/k


ICSI's Honeyfarms

• Honeyfarm Safety• ICSI's features:

– Windows Centric– Hot Telescope– Replay

• Replay-based filtering

– Spam Telescope• The Main ICSI Honeyfarm• Other possibilities:

– "Run this" Wormholes


ICSI Focus:Windows

• Microsoft Windows is our primary (currently only) hosted OS• This requirement dictates VM choice:

– VMWare Workstation or ESX server– Workstation: prototyping

• Limited scalability• Runs on everything

– ESX Server: production• Stringent hardware requirements• Memory sharing for (some) scalability

– Could be better– But can work across multiple close variants due to coalescing

• For now, NO host-OS specific customization– Dictates mechanism for demand allocation: NAT, instead of

customization– Allows the possibility of non-virtual honeypots as well

• ?Apple Systems?


ICSI'sArchitecture

GRE Tunnel

Honeyfarm

Network Telescope

Filtering

Attacker

Mapping Containment

Policing

VM Clusters

Detection

VManager

Filtering

Containment

Policing

Detection


Note onArchitecture

• Most components implemented in Click– Provides a modular, reusable framework

• Components in red we want to merge with UCSD– Need to better coordinate in this area– Relatively low overlap so far, but need


Safety: A Common FocusOf Both UCSD and ICSI

• What if a worm propagates through the honeyfarm and then infects somebody else?– "But they would get infected anyway" doesn't cut it…

• Two safety features:– Containment: the basic decision making on what is allowed outbound

• Connections back to the infecting host• Some "phone-home" channels may also be allowed

– Much malcode/attacks grab code from a third-party site– An independent policing module

• Shutdown the honeyfarm once it detects any abnormal behavior on outbound connections

• This is a safety belt, it should NEVER actually be invoked• Want a third safety feature as well:

– A monitoring system which observes the control-plane– Has the ability to turn-off the honeyfarm by power-sequencing the

network connections• Much more details on policies in UCSD's talk


The Telescopes

• We have 4 /16s arranged as two (almost) contiguous /15s belonging to ESNet…– Network is directly advertised and routed by ESNet

• But we also have, on loan, a "special" /23 netblock– Also advertised and routed by ESNet

• Much malcode is NOT random:– Linear scanners starting from the local address:

• Blaster and others– Local subnet preference

• Nimda, etc

• By selecting highly-likely addresses, we can gain an advantage in detection time– Local subnet preferences in particular have proven very effective


How HotIs The Hot Address Range?

0

20000

40000

60000

80000

100000

120000

140000

160000

/23 /16 /16 /16 /16

Num

ber o

f TC

P C

onne

ctio

ns

801351394451433


Filtering

• But we can't allow all communication:– Honeypot allocation/deallocation is very expensive for us

• VMWare doesn't support a lightweight clone

• We want to filter out known threats– But we still want to detect new attacks for existing vulnerabilities

• We want to detect Welchia as well as Blaster:– New attacks may require new signatures– New variants may be substantially more disruptive

– And we would like to avoid identification by attackers as a honeypot system

• Thus we need a low-cost mechanism to say whether an attack is worth forwarding to a real honeypot


Basic Filtering• Scan filtering

– Allow traffic to the first N destinations from a source.– Intuition: Scans from a source is homogeneous

• Init-Data filtering– Detect known attacks by looking at the first data transfer from a source– Intuition: Many simple attacks (e.g., CodeRed, Blaster, Slammer) can be filtered.– Scheme: Acknowledge to SYNs and any data packets following it

• University of Michigan scheme• Is this enough?

– Far too many active sources on the Internet– No, many attacks require complicated "conversations" before exposing its unique

malicious attention• See Pang et al "Characterizing Internet Background Radiation"

• Application-level responders are expensive in terms of development– Also, can't do "cut-through forwarding" if the attack deviates from the known script

• Our idea: replay-based filtering


Application IndependentReplay

• To positively identify a probe as being from a known or unknown source, it requires a complex dialog

– EG, Windows SMB file transfer• We can't build target-specific responders

– Too many variants and new targets• Can we use an existing dialog as a script for replaying an application session?

– Take one or two instances of a dialog• Eg, a recorded attack by a particular worm against one of our honeypots

– Recognize certain idioms:• Addresses, ports, and names encoded in the dialog• Ports which open for subsequent transfers• "Cookies" or session identifiers• Length fields• Prestated arguments

• Then use the current interaction as a guide– Update ports/addresses/subsequent connections as appropriate– Mimic back cookies and other changes


Responder-Side Replay

Original Flow Replay FlowAttacker Victim

Infected!

Attacker Filter

Detected!

12

34

5

1’2’

3’4’

5


ReplayStatus

• This works for single dialogs– For both the initiator (client) and responder (server)

• Tested with:– NFS file manipulation– FTP file transfer

• Including changing the filename argument for the client– CIFS/SMB file transfers– The Blaster worm– W32.Randex.D worm

• Performs attack through open file shares• Currently expanding to support multiple, simultaneous dialogs

– Primarily for server-side replay to act as a radiation filter– Possibility: Recognize commands by where dialogs diverge?

• Also desire replay for:– "Toxicology Screen": For this attack, what can get infected– Testing network devices, evaluating servers, interacting with Internet servers for

measurement purposes…


Replay-BasedFilters

• There are 1700 different application dialogs among 143224 connections to port 445/tcp– Connections to active honeypots– Used tethereal to generate a one-line summary for each data packet– Formulated each dialog in a canonical format

• Want to ignore anything in the "known" dialogs set, while allowing anything in the "unknown" set

• So use replay:– Replay as the server with the group of known dialogs

• If replay successful, classify and ignore that source– If replay fails, begin replaying the new dialog against a honeypot as

the client• Using the previous dialog as the starting script• Also, mark source as unknown and allow it to contact a live honeypot if

seen again


Attacker Filter VM

12

34

5

12

34

5

Known?

Responder-Side Replay

Initiator-Side Replay

Infected!


The Spam Telescope

• Half of the emails to @acme.com are sent to our email server– 100,000 messages per day– 6000 unique executables in 4 days

• We implemented a real time process to parse emails and retrieve attachments– Hash attachments to gain some statistics

• We plan to run attached executables on our honeypots to detect new email worms or multimode worms– Use email to penetrate the firewall, then exploit with local exploits


The MainHoneyfarm

• Located at LBNL in ESNet's machine room– Designed around HP DL360 G4 1u, dual processor servers

• Currently:– 1 server as "head unit"

• Previous head was a DL380, but suffered a catastrophic motherboard failure– 7 servers running ESX for honeypots

• Near term expansion (next couple of weeks)– Convert one ESX server into raw Linux for processing acme.com email

• Attach 3 TB disk array for tertiary storage– Add 6 more 1u servers– Add a redundant switch– Increase the disk space on the existing servers

• Generous support from:– ESNet: Network connectivity and rackspace– Hewlett Packard: Equipment– Microsoft: OS and software liscences– VMWare: VMWare liscences


Possibility:The "Run This" Wormholes

• We also want small, easy to use endpoints:– Distributed secrets– Endpoints in LANs– Nonblacklistable endpoints for crawlers

• Our plan is to create a "Run This" endpoint in Click– Creates a new MAC address derived from the host's MAC

• Obtain DHCP lease• Open GRE tunnel to the specified honeyfarm

– All traffic is forwarded through the tunnel– Outgoing traffic is strongly policed by the "Run This" module:

• Limited fanout• No contacting local addresses• ?What to do about LAN broadcast packets?

• Goal is an easy to use and trustable endpoint– Which does not trust the honeyfarm.

icsi honeyfarm status

Documents