firesense: firewall-based occupancy sensing -...

FireSense: Firewall-Based Occupancy Sensing

Richard Yu

University of California, Los Angeles

Table of Contents 1. Introduction ............................................................................................................................. 1

2. Background / Related Work .................................................................................................... 2

3. Tools ........................................................................................................................................ 2

3.1. PFSense ............................................................................................................................ 2

3.2. TShark .............................................................................................................................. 3

3.3. SVM ................................................................................................................................. 4

3.4. HMSVM ........................................................................................................................... 5

4. Implementation ........................................................................................................................ 5

4.1. The Sensor ........................................................................................................................ 5

TShark ....................................................................................................................... 5

Data Aggregator ........................................................................................................ 6

4.2. The Data Server ................................................................................................................ 9

4.3. Determining Occupancy ................................................................................................... 9

Peer Connections ...................................................................................................... 9

Router-Level Statistics ............................................................................................ 11

4.4. Gathering the Ground Truth ........................................................................................... 13

5. Evaluation ............................................................................................................................. 14

5.1. Binary Results ................................................................................................................ 14

5.2. Estimating Actual Occupancy ........................................................................................ 16

5.3. Network Traffic vs. Occupancy ..................................................................................... 16

6. Conclusion ............................................................................................................................ 18

7. Acknowledgments................................................................................................................. 18

8. References ............................................................................................................................. 19

9. Appendix ............................................................................................................................... 19

1

FireSense: Firewall-Based Occupancy Sensing

Richard Yu

University of California, Los Angeles

Abstract

As the field of occupancy detection grows, more and more sensors are being developed or repur-

posed into occupancy detectors. Each of these new sensors brings with them new fees in mainte-

nance and power. FireSense is a firewall-based sensor that seeks to repurpose existing infrastruc-

ture to add sensing capabilities with little overhead. The sensor monitors network traffic over the

firewall coverage space and reports network traffic statistics and events. Here, we show that the

information can be used for occupancy detection, but that is by no means the limit to FireSense’s

abilities.

1. INTRODUCTION

Detecting occupancy has been a popular area of research in recent years for its potential in appli-

cations including security and energy management. If a room or building is uninhabited, the lights

can be safely turned off, the HVAC system can be relieved, and even computing power can be

transferred to remote processes. Over time, the decreasing energy usage can add up to a large

amount of saved expenses. Using sensors can offload the responsibility of turning off lights and

appliances from humans onto automated systems.

FireSense is a novel sensor system that monitors internet traffic to determine occupancy. The sen-

sor is a software package built to work on a PFSense firewall that monitors incoming and outgoing

traffic. The advantages of the FireSense system are its low cost, unique area of sensing coverage,

and broad set of information. Often, sensors are single-function hardware that incur a separate

energy overhead just for monitoring occupancy. Because FireSense runs on a firewall, the in-

creased energy cost is minimal since the hardware can be used for other purposes and may already

be in place.

Further, many sensors have gaps in their ability to sense user presence. Motion detectors, the most

popular occupancy indictor, often fail to notice people as they work relatively motionless at their

computers. This often leads to a situation where people will find themselves at their desks only to

have the lights turn off, forcing them to furiously wave their arms to trigger the sensor. The new

system seeks to cover exactly this case, by detecting when a person is actively on their computer.

Data from FireSense can be combined with other sensors to create a more accurate and precise

view of an area’s level of occupancy and the behavior of the occupants.

Lastly, the FireSense sensor can provide much more detailed and broad information about network

usage than just the occupancy. A wide variety of individual and aggregate statistics and events can

be captured by the sensor and processed later on. Currently, occupancy is calculated on a backend

system using data captured from the sensor. An occupancy decision is made based on the results

of the Hidden Markov Support Vector Machine (HM-SVM) outlined by Altun et al. (2003)

2

2. BACKGROUND / RELATED WORK

Much work has been done to classify internet usage and connection information based on TCP

communications. Often, these methods would involve collecting packet information and commu-

nication statistics and feeding them into a machine learning algorithm. Popular inputs include TCP

port number, packet length statistics, flow duration, and packet inter-arrival time. (Nguyen &

Armitage, 2008) Moore and Zuev touched on the importance of using only packet arrival statistics

and packet header information to maintain user privacy. Their goal was to create a system that

could classify packets between bulk (ftp), databases, interactive (e.g. SSH, telnet), mail, services

(e.g. DNS, NTP), web access (www), peer-to-peer, attacks (worms and viruses), games, and mul-

timedia. (Moore & Zuev, 2005)

FireSense takes after IP classification technologies in a two ways. 1) FireSense only uses packet

header information, including TCP port and packet lengths. 2) FireSense follows the process of

gathering network data and uses a machine learning algorithm to create a Bayesian decision. How-

ever, FireSense also differs from traffic classification. Instead of stream-level information, much

of FireSense’s data are router-level traffic aggregates. It is possible to use coarse information, since

the decisions for FireSense are not as fine and particular. Also, due to the volume of the traffic and

the limitations of the PFSense hardware used in the experiment, gathering such finely-detailed

information may bog down the processing too much for the sensor to run in real-time.

IP Traffic classification technology and FireSense may be combined in the future. Being able to

classify flows into communication types can greatly benefit occupancy decisions. Instant messag-

ing and browser-based applications are more likely to indicate occupancy than bulk and peer-to-

peer traffic. In the meantime, though, simpler methods must be used to classify traffic, including

port numbers and header fields.

3. TOOLS

3.1. PFSense

Firewalls are often used in workspaces today to monitor and control the flow of traffic and to

separate networks. Because networks are often associated with physical spaces (e.g. a lab or office

space), monitoring traffic through a firewall can often be analogous to monitoring traffic for a

space. Throughout the course of this paper, we assume space of interest is covered uniquely by a

single firewall through which all user-generated traffic passes.

PFSense is a free and open source implementation of a firewall and router built atop FreeBSD. It

utilizes a web interface that allows administrators to create and manage user accounts, IP traffic

rules, forwarding, and traffic routing. Settings can also be managed by using a terminal directly on

the PFSense box. Using the latter method, the shell command line terminal can be accessed and

used as a normal FreeBSD operating system. (PFSense)

For testing, we used an ALIX.2 series dedicated PFSense board and connected to the terminal via

the serial connection. Files were downloaded onto the firewall using the web interface and the

sensor programs were run through the serial connection.

3

Table 1 Captured ports used in FireSense. Compiled from information from (Well known TCP and UDP ports used

by Apple software products, 2012), (Red Hat Enterprise Linux 4: Security Guide Appendix C. Common Ports, 2012),

and (Service Name and Transport Protocol Port Number Registry, 2012)

3.2. TShark

TShark is a terminal program used to capture, monitor, and extract information from packets over

a computer’s network interfaces. The program is the basis for the more well-known Wireshark.

Wireshark is actually a graphical user interface (GUI) that can be used to access TShark’s func-

tions. Both programs are controlled by a set of input and output filters that allow the user to control

what packets should be captured or ignored and what data should be gathered and reported for

PORT NAME DESCRIPTION

21 ftp FTP Control (Command)

22 ssh Secure Shell (SSH) service

23 telnet Telnet service

24 Private mail system

80 http HyperText Transfer Protocol (HTTP)

110 pop3 Post Office Protocol v3

113 auth Authentication and Ident protocols

143 imap Internet Message Access Protocol (IMAP)

194 irc Internet Relay Chat (IRC)

220 imap3 Internet Message Access Protocol version 3

443 http_secure Secure Hypertext Transfer Protocol (HTTP)

993 imaps Internet Message Access Protocol over Secure Sockets Layer (IMAPS)

994 ircs Internet Relay Chat over Secure Sockets Layer (IRCS)

995 pop3s Post Office Protocol version 3 over Secure Sockets Layer (POP3S)

1723 pptp Microsoft Point-to-Point Tunneling Protocol

2195 Apple Push Notification service

2196 Apple Push Notification - Feedback

3031 Remote AppleEvents

3283 NetAssistant Apple Remote Desktop

3389 RDP Windows Remote Desktop Protocol

3724 blizwow World of Warcraft Online gaming MMORPG

3784 VoIP program used by Ventrilo

5190 ICQ and AOL Instant Messenger

5222 xmpp Extensible Messaging and Presence Protocol

5900 VNC Apple Remote Desktop

5988 WEBM HTTP Apple Remote Desktop






8080 http-alt HTTP Alternative

4

captured packets. To process the packets, TShark first captures packets and stores them into a file.

From the file, TShark can process and extract the packet information for formatted output.

(Wireshark)

For FireSense, the TShark invocation is set to create a circular buffer of ten files, each at a size of

512 KB. Using a circular buffer keeps TShark’s data space controlled so it does not eventually run

the firewall out of memory. However, since the circular buffer continually overwrites itself, if the

packet capture rate is greater than the packet processing rate, the input and output iterators will

eventually collide and cause TShark to crash. Although not usually a problem on a small (personal)

scale, in a lab setting, crashing can be a major problem, causing the sensor to crash after only a

few minutes.

PFSense mitigates the risks and effects of TShark crashes in two ways: 1) the range of captured

packages is mitigated through the capture filter; and 2) FireSense can detect a crash and will restart

itself. Captured packets are limited to TCP connections using a limited set of ports. The ports

currently identified as important for FireSense are listed in Table 1.

3.3. SVM

Support Vector Machine (SVM) algorithms are a classification technology used in many applica-

tions. SVM algorithms take in N-dimensional data points and plot them on an N-dimensional

graph. Data points are then compared against an N-1-dimensional plane and classified as greater

or less than the plane. Once determined, the plane is used as a decision boundary to separate two

classifications of data points. In FireSense, the classifications used are “occupied” and “non-occu-

pied”. Planes can take on many shapes, defined by the SVM’s kernel. Popular kernels include

circular, linear, polynomial, and exponential. FireSense uses a linear kernel, declaring occupancy

when data point values are greater than those in the plane.

To create the decision boundary, data points with known classifications are input into the SVM

classifier. The classifier iterates through several boundary possibilities until it finally settles on

one with a small enough overall classification error. Once trained, an SVM classifier can be used

to classify new data points. The accuracy of the SVM is often gauged on how well it can classify

a data set not used in the training process.

The advantage of SVM is its ability to relate data channels to each other. Instead of considering

each channel individually, all dimensions are combined into a single formula that can have positive

or negative relationships between different data channels. FireSense uses SVM to correlate and

generate decision boundaries on a wide set of data channels.

Unfortunately, SVM is time-independent, but an occupancy inference can greatly benefit from

neighboring data. If nobody is in the area, but network traffic suddenly surges up, there is a high

probability that someone has just entered and started using a computer. If network access is cur-

rently low but has been high for a while in the recent past, there may just be a lull in network

usage, but the space may still be occupied. In order to account for time-based traffic and their

effects on state changes, Hidden Markov Models were added to SVM to create a better model for

occupancy.

5

3.4. HMSVM

Hidden Markov SVM is a system proposed by Y. Altun et al. in 2003. (Altun, Tsochantaridis, &

Hofmann, 2003) The system uses a Hidden Markov Model (HMM) to create transition labels based

on past and future behavior and classifications. These transition labels are then added to the data

as separate data channels before being fed into the SVM classifier. The result is a time-aware,

stateful data classifier. FireSense uses SVMhmm, an implementation of the Altun paper created by

a group in Cornell University based on SVMstruct. (Joachims, 2012)

4. IMPLEMENTATION

The FireSense system shown in Figure 1 consists of three major components: the sensor, the

server, and the processor. The sensor monitors IP traffic, aggregates the data, and reports it to the

server. The server stores information for retrieval from the processor. The processor downloads

information from the server and applies the HMSVM classification to determine occupancy. When

taken off the test bed, the processor can be moved onto the server.

4.1. The Sensor

FireSense’s Sensor module sits on the PFSense firewall. Its goal is to listen to network traffic,

aggregate the data, and send it to the data server. Figure 2 shows the block diagram for the Sensor

module. TShark is responsible for collecting packets and reporting them to the data aggregator.

The data aggregator is multithreaded python program and stores data metrics and events and re-

ports them to the server. Reports normally occur at regular intervals, but stream events are sent to

the data server as they happen. Streams occur between two IP addresses and exist as long as the

tracked TCP connection exists between the peers. For the rest of this paper, stream events will

refer to a stream’s opening and closing.

TShark

Certain settings must be passed into TShark for it to correctly connect to the rest of FireSense. The

first setting is to reduce the scope of captured packets to those with ports listed in Table 1. This

Figure 1 Block diagram showing the three major components of FireSense. 1) The Sensor is hosted on the PFSense

firewall box. 2) The Server is hosted on a virtual machine. 3) The processor is hosted on a backend PC.

6

serves the dual purposes of increasing the sensor’s maximum bandwidth and limiting the listening

ports, making it easy to filter the data and network metrics to only what is applicable to the appli-

cation. The second setting is to create a ring buffer of files for TShark to store the incoming capture

packets. Using a ring buffer limits the space that captured packets take so they do not fill up the

system’s entire memory. The down side to ring buffers is that they are susceptible to overflows,

causing TShark to crash when the input and output buffers collide. Setting the listening interface

to the LAN port ensures that all traffic going through is either to, from, or between local users.

Lastly, the capture output filters must be set correctly

to communicate with the data aggregator program. The

output features are listed in Table 2.

Data Aggregator

4.1.2.1. Filter

The data aggregator consists of three main sections: the

initial data parser and filter, the data managers, and the

data reporters. The filter contains the following two

rules for acceptable packets:

1) Only TCP and UDP packets are accepted.

2) Accepted packets have one peer inside the local net-

work and one peer outside the local network.

TSHARK ID DESCRIPTION

ip.proto IP Protocol

ip.src IP Source Address

ip.dst IP Destination Address

ip.len IP Packet Length

udp.srcport UDP Source Port

udp.dstport UDP Destination Port

tcp.srcport TCP Source Port

tcp.dstport TCP Destination Port

tcp.flags.ack TCP ACK Flag

tcp.flags.fin TCP FIN Flag

tcp.flags.reset TCP RESET Flag

http.user_agent HTTP User Agent

Table 2 TShark capture output filter items.

Figure 2 Sensor block diagram. Rounded squares represent sequence of events initiated by input form TShark. Dia-

monds represent events that can cause data to be sent to the Data Server.

7

Although the data aggregator is designed to accept and report on both TCP and UDP packets, in

practice, only TCP packets are sent to the data aggregator. Allowing UDP packets would overtask

the system and cause collisions in the TShark ring buffer.

4.1.2.2. Data Managers

The data managers are the computational meat of the sensor. There are three types of data manag-

ers: router, peer, and stream. Their functions are described below and summarized in Algorithm

1. The relationship between managers is described in Figure 3.

There is only one router manager and it directly

accepts all data that has passed through the fil-

tering stage. Data coming into the router man-

ager is used to update router-level metrics before

being passed to the appropriate peer manager

(based on the source and destination IP ad-

dresses).

There is one peer manager for each IP address in

the local network. Peer managers are created

when a new packet is captured whose source or

destination IP is a local address and does not al-

ready correspond to an existing manager. Peer

managers are torn down when there are no more

active TCP connections to the peer or when the

peer has not sent nor received a message within

a timeout period. Packets passed to the manager

are used to update peer-level metrics before be-

ing passed to a stream manager (TCP packets) or

discarded (UDP packets).

Receive Data

If data fails filter

Discard data

Else

Pass data to RouterManager

Update Router Metrics

If New Local IP

Create new PeerManager

Pass Data to PeerManager

Update Peer Metrics

If New IP/Port Pair

Create new StreamManager

Report Stream Open

Pass Data to StreamManager

Update Stream Metrics

If TCP.RESET or (TCP.FIN and TCP.ACK)

Report Stream Close

Tear down StreamManager

Algorithm 1 Sensor data capture behavior

Figure 3 Data Manager Layout

8

FireSense refers to ongoing TCP connections between two peers as streams. An example stream

is an ftp connection between peers A and B. The stream begins when the file transfer is set up and

ends when the transfer is torn down. Each peer manager contains a stream manager for each active

stream. New stream managers are created when a peer manager processes a packet whose foreign

IP is unknown. When a new stream is created, a stream open event is created that immediately

reports the stream creation to the data server. Stream managers are torn down when the connection

is destroyed or when no packet has been sent or received within a timeout period. Destroyed

streams are detected when a packet contains either the TCP RESET flag or both of the TCP FIN

and ACK flags. When a stream manager is destroyed, it generates a stream close event that is

immediately reported to the data server.

4.1.2.3. Reporters

There are three reporters in the sensor: router, peer, and stream. Each router runs in a separate

thread, making a total of four threads in the data aggregator: the input handler and three reporters.

Threads must obtain a lock on the Router Manager before writing to or reading any manager to

prevent data overwrites and collisions. Each reporter is responsible for managing timeouts and

periodic reports. Unique configurable update intervals exist for each reporter. Whenever an update

interval approaches, a reporter will search for managers that have timed out and remove them

before reporting metrics for each manager of the reporter’s type. Managers have their metrics

cleared after a report. Metrics are in terms of the reporting interval.

Periodic and event reports are sent to the data server through a TCP connection in JSON format.

The JSONs are labeled by the manager/reporter type and the message event trigger (open, close,

or periodic). The exact JSON formats can be found in the appendix. Peer IPs are anonymized by

performing an exclusive or between the last byte of the IP address and a mask byte randomly

generated when the data aggregator program begins. This protects the privacy of users by making

it harder to correlate activity with a specific user IP.

Figure 4 Data server file structure. Peer.txt, stream.txt, event.txt, and router.txt contain a list of JSON files received

from the sensor.

9

4.2. The Data Server

The FireSense data server is responsible for receiving and storing information for processing. It

listens for sensor reports on a configurable port and logs the data to file. Files are organized first

by date, then by anonymized local peer IP (if applicable) as shown in Figure 4. Files router.txt,

peer.txt, stream.txt, and event.txt contain lists of the received JSON objects from the sensor.

4.3. Determining Occupancy

Occupancy is detected by extracting features from the sensor reports, time-binning the features,

and running the classification algorithm on the resulting matrix. In general, features can be divided

into two subsets: peer connections and router-level statistics with the addition of time of day as a

feature. Features are summarized and binned into time-periods of size T. Typically, a time period

T = 1 minute works fairly well. A full list of features can be found in the appendix.

Peer Connections

4.3.1.1. Defining Peer Connections

Peer connections are defined as TCP connections between a peer and an external/foreign IP where

the port on either the external or the local side is one of the tracked ports. Table 3 lists examples

of what do and do not qualify as peer connections. Peer connections can be identified by the qual-

ifying end (external or local) and the tracked port number. If both sides use a tracked port number,

two connections are said to exist. In practice, the “no connection” scenario does not exist, since

the processing connections are the same as the capture filter connections shown in Table 1. Peer

connections are created based on the open/close events received from the sensor.

Once detected, connections are split into three useful categories: external connections, remote

connections, and web connections. Any connection that does not fall into those categories are not

used for occupancy processing. Foreign connections are all connections that contain the “foreign”

qualifier. Remote connections are local connections under a remote port listed in Table 4. Web

connections are external connections whose “user agent” field is filled in. User agent fields are

Table 3 List of remote ports.

LOCAL PORT EXTERNAL PORT CONNECTION

5600 80 Foreign 80

7200 5600 None

5222 8095 Local 5222

80 443 Local 80 / Foreign 443

Table 4 Example connections. Tracked port numbers can be found in Table 1. For any TCP connection, zero, one, or

two peer connections can be formed

PORT NAME DESCRIPTION

21 ftp FTP Control (Command)

22 ssh Secure Shell (SSH) service

23 telnet Telnet service

3283 NetAssistant Apple Remote Desktop

3389 RDP Windows Remote Desktop Protocol

5900 VNC Apple Remote Desktop

10

generally only filled in by web browsers, although automatic browser traffic (such as updates) can

also fill in the field. This field can be retrieved from http packets by TShark. The remaining unu-

tilized connections are local connections that are not under a remote port.

Foreign port and web connections are useful as inputs to the SVM algorithm. In general, when

monitored ports are more active or web usage is high, the likelihood of a person being at the com-

puter also increases. However, this assumption can be misleading at times. If a person is remotely

accessing a computer in the monitored space, they will likely generate increased traffic and, sub-

sequently, more port events. To deal with this issue, we mask external port connections with re-

mote connections. In other words, all connections are considered inactive while a remote connec-

tion exists.

Virtual private networks (VPNs) are another major concern for false positives in occupancy de-

tection. During the experiment, the firewall included an active PPTP connection but it did not

prove to be an issue. PFSense creates new interfaces that handle PPTP traffic. This keeps VPN

traffic separate from LAN traffic and, in turn, invisible to FireSense.

4.3.1.2. Using Peer Connections

Now that peer connections have been defined, it is important to know how they are linked to SVM

inputs, and ultimately, to the detection of occupancy. FireSense uses the number of currently active

connections on each port as the SVM feature. There are N+1 peer connection features fed to the

SVM, where N is the number of tracked ports and the extra feature summarizes web connections.

Each feature represents the current number of connections to that port P, currently active within

the monitored space.

𝐹𝑜𝑟𝑒𝑖𝑔𝑛 𝑃 = ∑ ∑ 𝑃𝑜𝑟𝑡 𝑃 𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠

𝑃𝑒𝑒𝑟𝑠

However, as mentioned at the beginning of Section 4.3, our features are time-binned. Peer con-

nection features must represent the status of the time-bin, so a new formula must be created based

on the instantaneous formula. The new feature we suggest for port P is the total number of unique

connections to P over the last period T. This includes connections that conform to any of the fol-

lowing criteria: 1) the connection began in period T or 2) the connection began before period T

and but was still active in period T. This rule is summarized in the following equation and illus-

trated in Figure 5.

𝐹𝑜𝑟𝑒𝑖𝑔𝑛 𝑃 = ∑ ( ∑ 𝐼𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑 𝑃 𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠 + ∑ 𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑖𝑛𝑔 𝑃 𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠)

𝑃𝑒𝑒𝑟𝑠

Figure 5 Illustration of connections counted in period T. Colored lines represent connection durations. Blue connec-

tions would be counted in period T. Red connections would not.

11

In total, thirty-two of the SVM features are attributed to foreign port and web connections. Most

of these features are sparse, and some are never populated in the sample set, but all are included

in the system for the sake of completeness and robustness.

Router-Level Statistics

Router-level statistics are taken from the periodic router updates from the sensor. These statistics

include information on active peers, TCP packet counts, and TCP packet sizes. The router update

period of the sensor during the experiment was arbitrarily set to five seconds, much shorter than

the average time bin used for occupancy evaluation. Although information within a time bin can

be easily summarized by summing or averaging values, more information can be gleaned by taking

and comparing varying statistical quantities. To investigate these options, a range of statistical

functions were identified for use as SVM features. Table 5 details the router statistics (horizontal)

and the summarizing functions (vertical) considered as features. Marked boxes indicate features

that were ultimately chosen for use. These values most closely resembled occupancy levels

throughout the day.

The chosen features, shown in Figure 6, are predominantly focused on outward traffic. Most user-

generated traffic is in request form, and outward traffic ideal for tracking requests. Additionally,

since inbound traffic is generated from external sources, it can be less predictable and more prone

to false spikes. The same argument applies for total packet count and sizes, since they are heavily

affected by inbound traffic.

We again stress that the router-level features actually are the results of statistics applied to period-

ically reported values from the sensor. Consider the case where five peers actively send or receive

packets within the time period T. For a router reporting period R, there are R/T samples within the

time bin. If, during any of those samples, all peers send or receive packets, the feature value will

be five. However, if each peer is active in a different sample, the result will be one. Note that there

is no feature that will give the number of unique peers active within the time bin. Table 6 lists the

router-level features and their equations.

Router

Stats

Functions

Active

Peers

Packet

Count

Packet

Count

(Out)

Packet

Count

(In)

Largest

Packet

Largest

Packet

(Out)

Largest

Packet

(In)

Max X X

Min X X

Sum X

Mean

Standard

Deviation

Median

Interquartile

Range (IQR) X

Range

Table 5 Available vs. chosen router statistics. The left margin indicates statistical summaries; the top margin indicates

router statistics. Items with an X are used as SVM features.

12

FEATURE EQUATION

Maximum Active Peers 𝑀𝑎𝑥𝑇(∑ 𝑈𝑛𝑖𝑞𝑢𝑒 𝑃𝑒𝑒𝑟𝑠𝑅 )

Maximum Packet Count (Out) 𝑀𝑎𝑥𝑇(𝐶𝑜𝑢𝑛𝑡𝑅(𝑃𝑎𝑐𝑘𝑒𝑡𝑠))

Minimum Packet Count (Out) 𝑀𝑖𝑛𝑇(𝐶𝑜𝑢𝑛𝑡𝑅(𝑃𝑎𝑐𝑘𝑒𝑡𝑠))

Sum of Packet Count (Out) ∑ 𝐶𝑜𝑢𝑛𝑡𝑅(𝑃𝑎𝑐𝑘𝑒𝑡𝑠)𝑇

Interquartile Range of Packet Count (Out) 𝐼𝑄𝑅(𝐶𝑜𝑢𝑛𝑡𝑅(𝑃𝑎𝑐𝑘𝑒𝑡𝑠))

Minimum Largest Packet (Out) 𝑀𝑖𝑛𝑇(𝑀𝑎𝑥𝑅(𝑃𝑎𝑐𝑘𝑒𝑡 𝑆𝑖𝑧𝑒))

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 4 8 12 16 20 24

Occupancy and Router-Level Features

TCP_Out_Max_Min Peers_Max TCP_Out_Max TCP_Out_Min

TCP_Out_IQR TCP_Out_Sum Occupancy

Table 6 Router-level features and their equations.

Figure 7 Occupancy plotted on above of router-level statistic features for one day. Both occupancy and the statistics

are plotted in as fractions of their maximum values in the day.

Figure 6 Camera server file system layout

13

4.4. Gathering the Ground Truth

Ground truth occupancy for the experiment was gathered by monitoring entry points into the lab.

Pictures are taken from cameras mounted near the doors and stored in the directory structure shown

in Figure 7. The pictures can be examined at a later time to establish entry and exit times.

To save researchers from needing to look at the entire days’ data, camera functionality has been

linked to sensors on the doors that produce open and close events. When an event occurs, a mes-

sage is sent to the camera server, housed on the same virtual machine as the FireSense server but

using a different port. The message contains a door identifier and either “Open” or “Closed”.

When the server receives an open message, it begins recording pictures of the door. Pictures are

recorded every half-second until the corresponding close message is received. All pictures taken

during one open-to-close period are stored in a folder labeled by the open time.

The camera server is implemented as a multithreaded program, containing one thread per door, an

additional thread to manage the server interface, and finally, the main thread. The door threads

have two states: waiting for an open door and processing an open door. Each door thread has an

associated open door Boolean value. The Booleans can only bet set or unset by the server interface

thread. Algorithm 2 shows the functionality of the door and interface threads respectively. Figure

8 shows a functional example of the detection process.

Algorithm 2 Camera server algorithms. A) Door thread process. B) Server interface process.

Loop Forever

If door open event

Get time

Create time-stamped folder

While door is open

Record picture

Wait 0.5 seconds

(a)

Loop Forever

Wait for message

Get door, open/close from message

If open

Set door indicator to open

If close

Set door indicator to close

(b)

Figure 8 Camera server and door sensor example. Sensors monitor the doors and send open and close events to the

server, which then begins or ends the capture process. The boxed text: “L Open” and “R Closed” are messages

relayed to the server by the sensors.

14

5. EVALUATION

5.1. Binary Results

The FireSense system was tested over a two and a half week period, where eleven weekdays were

evaluated. Five days were used as the training set for the SVM classifier. Time bins were set to

one minute. Once the data was processed into binned features, the test set was fed into both

Matlab’s svmtrain function and the SVMhmm tool with binary occupancy ground truth (occupied

vs. non-occupied). This way, we could compare a normal SVM classifier with the combination

SVM and HMM classifier. The results are summarized in Table 7. We calculate accuracy as the

percentage of correctly-calculated time bins through the day. Figure 9 shows the calculated versus

actual binary occupancy values for one

day. Results from the SVMhmm tool tend

0

0.5

1

1.5

2

2.5

3

0 4 8 12 16 20 24

One-Day Occupancy: SVM vs SVM HMM

Matlab SVM SVM HMM Occupancy

Figure 9 One-Day Occupancy values. Ground truth, SVM HMM result, and Matlab SVM result

DATE MATLAB SVM SVM HMM

4/8/2013 84.04% 92.58%

4/9/2013 90.43% 91.93%

4/10/2013 92.36% 92.83%

4/11/2013 94.41% 96.24%

4/12/2013 91.52% 94.90%

Average 90.55% 93.70%

(b)

DATE MATLAB SVM SVM HMM

3/18/2013 82.18% 87.51%

3/20/2013 86.44% 96.40%

3/22/2013 73.59% 85.76%

4/15/2013 95.26% 95.78%

4/16/2013 92.03% 94.55%

4/17/2013 96.16% 98.17%

Average 87.61% 93.03%

(c)

Table 7 (a, b, c) Occupancy accuracies of Matlab svmtrain/svmclassify versus SVMhmm. A) Overall accuracy on the

entire data set. B) Accuracy over the training data set. C) Accuracy over the remainder of the set.

DATE MATLAB

SVM

SVM

HMM

3/18/2013 82.18% 87.51%

3/20/2013 86.44% 96.40%

3/22/2013 73.59% 85.76%

4/8/2013 84.04% 92.58%

4/9/2013 90.43% 91.93%

4/10/2013 92.36% 92.83%

4/11/2013 94.41% 96.24%

4/12/2013 91.52% 94.90%

4/15/2013 95.26% 95.78%

4/16/2013 92.03% 94.55%

4/17/2013 96.16% 98.17%

Average 88.95% 93.33%

(a)

15

to be more stable than the SVM counterparts due to its stateful nature.

Unfortunately, SVMhmm uses an algorithm involving past and future data, resulting in classifiers

whose practical operation differs from their ideal operation. To explore this problem, we simulated

running the classification in delayed real-time. The system could only rely on data from the begin-

ning of the day up to the current time plus the delay. For a five minute delay, the occupancy at

11:35 AM is calculated with data from 12:00 AM to 11:40 AM. For a ten minute delay, data ranges

from 12:00 AM to 11:45 AM. The results are shown in Table 8 and illustrated in Figure 10.

DATE IDEAL 1M DELAY 5M DELAY 10M DELAY 20M DELAY

3/18/2013 86.34% 83.45% 85.52% 87.31% 89.01%

3/20/2013 95.62% 92.83% 93.93% 94.54% 96.40%

3/22/2013 85.95% 75.87% 80.06% 82.51% 84.28%

4/8/2013 91.34% 85.15% 87.93% 89.80% 91.73%

4/9/2013 92.03% 89.87% 90.82% 91.14% 91.36%

4/10/2013 92.92% 92.15% 93.82% 93.58% 92.97%

4/11/2013 96.29% 96.50% 96.49% 96.34% 96.39%

4/12/2013 94.97% 91.59% 93.01% 94.07% 94.76%

4/15/2013 95.69% 95.55% 95.75% 95.88% 95.85%

4/16/2013 94.62% 95.95% 97.27% 97.82% 97.73%

4/17/2013 98.19% 98.34% 98.19% 97.96% 98.17%

Average 93.09% 90.66% 92.07% 92.81% 93.51%

(a)


4/8/2013 91.34% 85.15% 87.93% 89.80% 91.73%

4/9/2013 92.03% 89.87% 90.82% 91.14% 91.36%

4/10/2013 92.92% 92.15% 93.82% 93.58% 92.97%

4/11/2013 96.29% 96.50% 96.49% 96.34% 96.39%

4/12/2013 94.97% 91.59% 93.01% 94.07% 94.76%

Average 93.51% 91.05% 92.41% 92.99% 93.44%

(b)


3/18/2013 86.34% 83.45% 85.52% 87.31% 89.01%

3/20/2013 95.62% 92.83% 93.93% 94.54% 96.40%

3/22/2013 85.95% 75.87% 80.06% 82.51% 84.28%

4/15/2013 95.69% 95.55% 95.75% 95.88% 95.85%

4/16/2013 94.62% 95.95% 97.27% 97.82% 97.73%

4/17/2013 98.19% 98.34% 98.19% 97.96% 98.17%

Average 92.73% 90.33% 91.78% 92.67% 93.57%

(c)

Table 8 SVMhmm accuracies for real-time processing with delays over a) the total data set; b) the training set; and c)

the non-training set.

16

Figure 10 Occupancy calculations by SVMhmm with varying real-time delays from 1 minute to 20 minutes.

5.2. Estimating Actual Occupancy

In addition to binary occupancy decisions, FireSense information can also be used to give a hint

to the actual occupancy level of the space. To do this, we attempted to track the number of active

devices in the lab. Peer connection features were calculated for each individual peer. A peer was

considered “active” when, during any minute, the number of active connections on any port is

greater than 30% of the maximum for that port. For example, the daily maximum of port 80 con-

nections may be 30 and the daily maximum of port 443 connections may be 20. If, within a time

bin, port 443 has over 0.3 * 20 = 6 connections or port 80 has over 0.3 * 30 = 9 connections, the

peer is considered active. Furthermore, peer activity is held for five minutes after the detected

activity drops off, to simulate a person being in the lab but not performing active connections. The

results are detailed in Figure 11 for one day. In this figure, we compare activity calculations with

and without port 80 included. While port 80 tends to be very active during occupancy, it is also

active during passive periods. This default http port is used by many programs that do not involve

user interaction, including passive Dropbox synchronization and program updates.

Although this results show some false positives, the general trends of occupancy and active devices

have many similarities. The results show promise for use in tracking occupant activity inside the

space once occupancy has been established. Additionally, while this calculation alone is obviously

not sufficient for determining the occupancy level of the space, it can possibly be used in conjunc-

tion with other sensor data to calculate a good estimate of occupancy.

5.3. Network Traffic vs. Occupancy

Although network traffic can be a great identifier of occupancy, there are often cases when people

are in the lab but not using the network. They could be engaging in other activities or the network

0

1

2

3

4

5

6

0 4 8 12 16 20 24

Occupancy Calculations by Delay

Occupancy SVM HMM 1m Delay 5m Delay 10m Delay 20m Delay

17

(a)

(b)

(c)

(d)

0

2

4

6

8

10

12

0 4 8 12 16 20 24

0

2

4

6

8

10

12

0 4 8 12 16 20 24

0

2

4

6

8

10

12

0 4 8 12 16 20 24

0

2

4

6

8

10

12

0 4 8 12 16 20 24

Figure 11 Active Peer Calculations vs. Occupancy. Occupancy is plotted in blue, calculated values are plotted in

orange. A) Occupancy vs. active peers. B) Occupancy vs. active peers with a running average filter of size 10. C)

Occupancy vs. active peers without port 80. D) Occupancy vs. active peers without port 80 with a running average

filter of size 10.

18

could be down. One test day, 3/18/13 summarized in Figure 12, shows this behavior. It is the

reason why occupancy results on this day are notably less accurate than the others, losing as much

as 7% accuracy versus the average in some cases.

6. CONCLUSION

We have shown that FireSense data can be used to generate occupancy decisions for a majority of

situations, with average accuracies over 90%. Additionally, we saw how calculating the number

of active peers shows correlation with the actual level of occupancy, though not enough to create

a reasonable estimate on its own. Lastly, we saw how occupants may not always generate network

traffic, leading to prediction errors.

As an occupancy sensor, FireSense works fairly well, but it does have flaws and blind spots. How-

ever, if occupancy information is otherwise known or determined by a combination of heteroge-

neous sensors, FireSense’s strengths and weaknesses can both be used to gauge user activity. When

occupancy is high and network traffic is low, occupants may not be on their computers, there might

be network traffic, or there may be collaborative activity. As these areas of research advance, sys-

tems like FireSense that can be attached to existing infrastructure and maintained at minimal cost

can become very cheap and useful tools in an ever-expanding repertoire.

7. ACKNOWLEDGMENTS

This project was supported by: Haksoo Choi, who helped set up the virtual machine for the camera

and data servers, set up access to the PFSense firewall, and helped with understanding the PFSense

infrastructure; Kevin Ting, who set up and managed the door sensors and messages coming into

the camera server; and Professor Mani Srivastava, for guidance and ideas throughout this en-

deavor.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 4 8 12 16 20 24

Peers_Max TCP_Out_Max_Min TCP_Out_Max TCP_Out_Min

TCP_Out_IQR TCP_Out_Sum Occupancy

Figure 12 Chart showing discrepancy between occupancy and network traffic, specifically around 16:00-17:00. The

thick black line indicates occupancy.

19

8. REFERENCES

(2012, December). Retrieved from PFSense: http://pfsense.org

Altun, Y., Tsochantaridis, I., & Hofmann, T. (2003). Hideen Markov Support Vector Machines.

International Conference on Machine Learning (ICML).

Joachims, T. (2012, December). SVM hmm: Sequence Tagging with Structural Support Vector

Machines. Retrieved from Cornell Department of Computer Science:

http://www.cs.cornell.edu/people/tj/svm_light/svm_hmm.html

Moore, A. W., & Zuev, D. (2005). Internet Traffic Classification Using Bayesian Analysis

Techniques. SIGMETRICS'05. Banff, Alberta, Canada: ACM.

Nguyen, T. T., & Armitage, G. (2008). A Survey of Techniques for Internet Traffic Classification

using Machine Learning. IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 10,

NO. 4, 56-76.

Red Hat Enterprise Linux 4: Security Guide Appendix C. Common Ports. (2012, December).

Retrieved from CentOS: http://www.centos.org/docs/4/html/rhel-sg-en-4/ch-ports.html

Service Name and Transport Protocol Port Number Registry. (2012, December). Retrieved from

IANA: http://www.iana.org/assignments/service-names-port-numbers/service-names-

port-numbers.xml

Well known TCP and UDP ports used by Apple software products. (2012, December). Retrieved

from Apple Support: http://support.apple.com/kb/ts1629

Wireshark. (2012, December). Retrieved from http://www.wireshark.org/

9. APPENDIX

Figure 13 List of features used in SVM classifiers for occupancy.

Time of Day Foreign 143 Foreign 2195 Foreign 5222 Foreign 8080

Foreign 21 Foreign 194 Foreign 2196 Foreign 5900 Web Accesses

Foreign 22 Foreign 220 Foreign 3031 Foreign 5988 Max Peers

Foreign 23 Foreign 443 Foreign 3283 Foreign 6665 Max TCP Out Count

Foreign 24 Foreign 993 Foreign 3389 Foreign 6666 Min TCP Out Count

Foreign 80 Foreign 994 Foreign 3724 Foreign 6667 IQR TCP Out Count

Foreign 110 Foreign 995 Foreign 3784 Foreign 6668 Sum TCP Out Count

Foreign 113 Foreign 1723 Foreign 5190 Foreign 6669 Min TCP Out Max Size

20

Router Peer Stream

Type: 'Router' Type: ‘Peer’ Type: ‘Stream’

Event: 'Periodic' Event: ‘Periodic’ Event: ‘Open’/’Close’/’Periodic’

Time Time Time

Peers (anonymized) Local IP (Anonymized) UserAgent

PeerCount TcpStats Local IP (Anonymized)

ActivePeerCount Counts Foreign IP

TcpStats Total LocalPorts

Counts: Incoming ForeignPorts

Total Outgoing Counts

Incoming MaxSizes Total

Outgoing Incoming Incoming

MaxSizes Outgoing Outgoing

Incoming UdpStats MaxSizes

Outgoing Counts Incoming

UdpStats Incoming Outgoing

Counts Outgoing

Total MaxSizes

Incoming Incoming

Outgoing Outgoing

MaxSizes

Incoming

Outgoing

Figure 14 JSON formats for sensor communication to the server. Values with single quotes are static values that are

used to identify the JSON type. Event JSONs carry the same information as stream JSONs, but have either “Open”

or “Close” in the event slot.

firesense: firewall-based occupancy sensing -...

Documents