[ieee 2013 sixth international conference on advanced computational intelligence (icaci) - hangzhou,...

6
2013 Sixth Inteational Conference on Advanced Computational Intelligence October 19-21, 2013, Hangzhou, China FPGA Targeted Implementation of a Neurofuzzy System for Real Time TCP/IP Traffic Classification Alessandro Cinti and Antonello Rizzi Abstract-As Internet traffic grows rapidly, it is necessary to monitor and control TCP/IP flows in order to ensure the quality of service and to filter out unwanted traffic by automatic, effective and inexpensive technical solutions. To this aim, especially when dealing with Gbit/s links, real time TCP/IP traffic classification can be performed by dedicated high speed processing devices, avoiding computationally expensive deep packet inspection techniques and relying only on packet features independent of payload content. In this paper we propose to employ an FPGA to design a stand-alone device using only information available at network layer, namely packet sizes, directions and inter-arrival times, to perform flow classification according to application layer protocol (such as HTTP, FTP, SSH, POP3, etc.). The classification system is based on neurofuzzy Min-Max networks, trained by Adaptive Resolution procedures (ARC and PARC algorithms). In order to deal with very high speed links and a large amount of concurrent traffic flows, we propose a complete FPGA targeted implementation of the whole system. Our design is intended to place on a single FPGA all the needed components, including the neurofuzzy Min-Max classifier. The paper describes in detail some interesting technical solutions aiming at optimizing both FPGA working frequency and circuit complexity I. INTRODUCTION A s broadband communications widen the range of popular applications, an increasing demand of fast traffic classification techniques raises according to the services that generate data flows [1], [2]. The specific meaning of service depends on the context and purpose of traffic classification. A sufficiently robust classifier could be a usel element in implementing differentiated Quality of Service (QoS) without deploying complex traffic engineering schemes that require cooperation with end hosts. Other uses of traffic classification are security related, e.g. traffic monitoring for intrusion detection or policy enforcement. Different approaches to traffic classification have been developed, using information available at IP layer such as packet inter-aival times, packet sizes and overall amount of bytes transferred [3], [4]. Some proposals [5] use supervised machine leaing algorithms on a wide set of traffic characteristics, while other works [6] rely on unsupervised machine leaing techniques. Manuscript received June 15,2013. Alessandro Cinti is with the Department of Information Engineering, Electronics and Telecommunications, University of Rome "La Sapienza", Rome,Italy (e-mail: alessandro.cinti@uniromal.it). Antonello Rizzi is with the Department of Information Engineering, Electronics and Telecommunications, University of Rome "La Sapienza", Rome,ltaly (e-mail: antonello.rizzi@uniromal.it). 978-1-4673-6343-3/13/$3l.00 ©2013 IEEE 312 The main traffic classification techniques can be distinguished in three types: port based analysis, deep packet inspection (DPI) and statistical based system. We have decided to follow the last approach by adopting a patte recognition technique based on neurofuzzy Min-Max networks as the core inference engine. The system is able to recognize and classi application flows by analyzing some features (directions, lengths and inter-arrival times) of their first few IP packets. We have tested successfully the classification algorithm on representative TCP-based applications such as HTTP, FTP, SSH, POP3 [18]. In this paper we provide an implementation proposal describing an architecture for a stand-alone device able to perform in real-time TCP/IP flow classification and application filtering, adopting an FPGA (Field Programmable Gate Array) based embedded system solution. An FPGA is a user-programmable integrated circuit that can be thought as an array of reconfigurable logic blocks (LBs), linked by a hierarchy of reconfigurable interconnections. Programming an FPGA simply means to use and configure a subset of LBs and to define data links among them in order to realize a given digital system. The inherent parallelism of the logic resources on an FPGA allows for considerable computational throughput even at low MHz clock rates. In technical literature related to Soſt Computing and Patte Recognition fields it is possible to fmd many FPGA implementations of complex algorithms. In particular, there are very interesting papers dealing with the hardware implementation of neural networks and fuzzy systems on FPGA [11], [12], [13]. Our research team has successlly faced several patte recognition problems using neurofuzzy classifiers, adopting neurozzy Min-Max networks trained by ARC (Adaptive Resolution Classifiers) and PARC (Pruning ARC) algorithms [8], such as the ones described in [14], [15], [16]. In fact, among neurozzy classifiers, Simpson's Min-Max networks have the advantage to be trained in a constructive way. ARC and PARC training algorithms are characterized by a high automation degree and allow to synthesize neurofuzzy Min- Max networks with a remarkable generalization capability. For this reason we have planned to adopt neurozzy Min- Max classifiers as the core of the proposed TCPIIP flow classifier, facing the design of an embedded system characterized by high performances.

Upload: antonello

Post on 06-Mar-2017

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

2013 Sixth International Conference on Advanced Computational Intelligence

October 19-21, 2013, Hangzhou, China

FPGA Targeted Implementation of a Neurofuzzy System for Real Time TCP/IP Traffic Classification

Alessandro Cinti and Antonello Rizzi

Abstract-As Internet traffic grows rapidly, it is necessary to

monitor and control TCP/IP flows in order to ensure the

quality of service and to filter out unwanted traffic by

automatic, effective and inexpensive technical solutions. To this

aim, especially when dealing with Gbit/s links, real time

TCP/IP traffic classification can be performed by dedicated

high speed processing devices, avoiding computationally

expensive deep packet inspection techniques and relying only

on packet features independent of payload content. In this

paper we propose to employ an FPGA to design a stand-alone

device using only information available at network layer,

namely packet sizes, directions and inter-arrival times, to

perform flow classification according to application layer

protocol (such as HTTP, FTP, SSH, POP3, etc.). The

classification system is based on neurofuzzy Min-Max

networks, trained by Adaptive Resolution procedures (ARC

and PARC algorithms). In order to deal with very high speed

links and a large amount of concurrent traffic flows, we

propose a complete FPGA targeted implementation of the

whole system. Our design is intended to place on a single FPGA

all the needed components, including the neurofuzzy Min-Max

classifier. The paper describes in detail some interesting

technical solutions aiming at optimizing both FPGA working

frequency and circuit complexity

I. INTRODUCTION

As broadband communications widen the range of popular applications, an increasing demand of fast traffic classification techniques raises according to the

services that generate data flows [1], [2]. The specific meaning of service depends on the context and purpose of traffic classification. A sufficiently robust classifier could be a useful element in implementing differentiated Quality of Service (QoS) without deploying complex traffic engineering schemes that require cooperation with end hosts. Other uses of traffic classification are security related, e.g. traffic monitoring for intrusion detection or policy enforcement. Different approaches to traffic classification have been developed, using information available at IP layer such as packet inter-arrival times, packet sizes and overall amount of bytes transferred [3], [4]. Some proposals [5] use supervised machine learning algorithms on a wide set of traffic characteristics, while other works [6] rely on unsupervised machine learning techniques.

Manuscript received June 15,2013. Alessandro Cinti is with the Department of Information Engineering,

Electronics and Telecommunications, University of Rome "La Sapienza", Rome, Italy (e-mail: [email protected]).

Antonello Rizzi is with the Department of Information Engineering, Electronics and Telecommunications, University of Rome "La Sapienza", Rome, ltaly (e-mail: [email protected]).

978-1-4673-6343-3/13/$3l.00 ©2013 IEEE 312

The main traffic classification techniques can be

distinguished in three types: port based analysis, deep packet

inspection (DPI) and statistical based system. We have

decided to follow the last approach by adopting a pattern

recognition technique based on neurofuzzy Min-Max

networks as the core inference engine. The system is able to

recognize and classify application flows by analyzing some

features (directions, lengths and inter-arrival times) of their

first few IP packets. We have tested successfully the

classification algorithm on representative TCP-based

applications such as HTTP, FTP, SSH, POP3 [18]. In this paper we provide an implementation proposal describing an architecture for a stand-alone device able to perform in real-time TCP/IP flow classification and application filtering, adopting an FPGA (Field Programmable Gate Array) based embedded system solution. An FPGA is a user-programmable integrated circuit that can be thought as an array of reconfigurable logic blocks (LBs), linked by a hierarchy of reconfigurable interconnections. Programming an FPGA simply means to use and configure a subset of LBs and to define data links among them in order to realize a given digital system. The inherent parallelism of the logic resources on an FPGA allows for considerable computational throughput even at low MHz clock rates. In technical literature related to Soft Computing and Pattern Recognition fields it is possible to fmd many FPGA implementations of complex algorithms. In particular, there are very interesting papers dealing with the hardware implementation of neural networks and fuzzy systems on FPGA [11], [12], [13]. Our research team has successfully faced several pattern recognition problems using neurofuzzy classifiers, adopting neurofuzzy Min-Max networks trained by ARC (Adaptive Resolution Classifiers) and P ARC (Pruning ARC) algorithms [8], such as the ones described in [14], [15], [16]. In fact, among neurofuzzy classifiers, Simpson's Min-Max networks have the advantage to be trained in a constructive way. ARC and P ARC training algorithms are characterized by a high automation degree and allow to synthesize neurofuzzy Min­Max networks with a remarkable generalization capability. For this reason we have planned to adopt neurofuzzy Min­Max classifiers as the core of the proposed TCPIIP flow classifier, facing the design of an embedded system characterized by high performances.

Page 2: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

II. CHARACTERIZATION OF ApPLICATION FLOWS

Classification is based on features of the first few IP

packets of a traffic flow, so that on-line implementation is

feasible. Most prominent packet features that can be

observed even for encrypted flows in real time are payload

size, direction and packet inter-arrival times. Our approach

based on machine learning systems demonstrates that this

choice is sufficient to obtain highly accurate classification

results.

A traffic flow F is defmed as the bi-directional, ordered

sequence of IP packets exchanged during a TCP connection,

where ordering is based on time stamps at capture point.

Within a TCP connection, application data as well as control

segments are delivered, such as those related to three-way

handshake (RFC-793) and TCP ACKs. A regular flow is

made up of N packets, orderly numbered from the packet

carrying the TCP SYN segment (PKo) to the one carrying

the TCP FIN segment (PKN_,). After detecting a TCP SYN

segment, we expect a corresponding TCP FIN to occur

within a given maximum connection life time e, otherwise

we truncate the flow when a time e elapses after the capture

of PKo. According to [7] and [9], we set e = 600 seconds.

Packets of a flow share common TCP/IP header values that

we call "flow identifier": IP source address, IP destination

address, TCP source port, TCP destination port and Protocol

Type. For classification purposes, we consider the following

features for each IP packet of a flow: direction, length and

arrival time. Hence each flow is characterized as an ordered

sequence of tuples {dn, In. Tn},with n = 0, ... , N - 1, that we

call "packet characteristics". Specifically, for each IP packet

PK;:

• dn is a binary digit {O, I} that represents the packet

direction: value 1 encodes the direction of the SYN

packet and value 0 the opposite direction;

• In is the length of the packet expressed in bytes: the

maximum length In/ax, that we found in our

experiments is 1500 bytes while the minimum

length In/in obtained in all measurements ranges

between 40 and 56 bytes, depending on options in

the TCP and IP headers.;

• Tn is the time stamp of the packet at the capture

point.

Given the overall raw traffic trace, flows are identified by applying a filter matched to the flow identifier. For application relying on TCP transport, flow's start and end are detected by TCP header flags SYN and FIN respectively. For each detected new flow we store the ordered set of features extracted from the first packets (direction, length and arrival time). A key issue concerns which packets of a given flow should be used in constructing the pattern feeding the classifier. In this regard, application related information should be isolated from TCP and network related effects, e.g. TCP ACKs or TCP control segments (e.g. three-way handshake packets), end-to-end round trip times,

313

c s c s PKo: SYN

PK,: SYN-ACK

PK,: ACK

PK,: APP,

PK.: ACK

PKs: APP, PKs: APP, -7 (d"I"t,)

PK.: ACK

PKi APP,

PK.: FIN

PK9: ACK

PK,o: FIN

PKll: ACK

� � Fig. I. TCP application flow (J = 3): (a) before and (b) after preprocessing.

retransmissions triggered by TCP. Hence, the following packets are not considered:

• first two packets carrying TCP three-way handshake

messages: PKo = SYN and PK, = SYN-ACK;

• TCP ACK packets: packets carrying only a TCP

level ACK and no payload data can be recognized

because their length is equal to the length of PKo;

• retransmitted packets, that can be recognized.

An example of such a packet filtering in a connection from client C to server S is given in Fig. 1. The preprocessed data

relevant to a given flow F are the tuples {dn, An = In - lACK, 'Tn = Tn - To}, where lACK is the length of the TCP ACKs, equal to the length of PKo, and To is the time stamp of PKo, i.e. the SYN packet. Packet lengths are decreased by the TCP and IP header lengths, so as to leave the actual application related data length. Packets turning out to have An = 0 are discarded (they are just TCP ACKs). Let J denote the set of indices of pre-processed packet features with positive length. Then the pre-processed flow is F

* = {dn, An, Tn}, n = 1, ... , J. After

tests and analysis of results, we set the target value of J to strike a convenient trade-off between high classification accuracy and an acceptable classification delay. As our approach is aimed at real time use, we set a maximum value of J equal to ten.

III. MIN-MAx NEUROFUZZY NETWORKS

From the point of view of data driven modeling techniques, many practical applications concerning diagnostic and identification problems can be expressed and solved as classification problems. Basically a classification problem P can be defined as follows. Let P : RD -> L be an unknown oriented process to be modeled, where RD is the domain set and the co-domain L is a label set, i.e. a set in which it is not possible (or misleading) to define an ordering function and, hence, any dissimilarity measure between its elements. Let K be the number of classes in L. Let Sir and SIS be two sets of input-output pairs, namely the training set and the test set, subject to the constraint Sir n SIS = 0. Once fixed

Page 3: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

a target model family T, a training algorithm is in charge of synthesizing a particular instance r* of T, exclusively on the basis of the information contained in Sin such that the classification error of T

* computed on SIS will be minimized.

Once trained, a classification model should be able to classify correctly any D-dimensional input vector X belonging to the classification process domain (generalization capability). The Min-Max classification strategy consists in directly defining the decision regions of the unknown classification process to be modeled by covering the patterns of the training set with hyperboxes. This technique has been originally proposed in [10]. A hyperbox defined in JtY is a finite polyhedral region delimited by 2 D hyperplanes, each constrained to be parallel to the coordinate axes of the input space reference system. On the basis of these constraints, it is possible to establish univocally the size and the position of each hyperbox by means of two vertices, namely the minimum (Min) point v and the maximum (Max) point w, where v and w are respectively the closest and the farthest vertices to the origin of the domain reference system. Since we are facing an exclusive classification problem, each hyperbox is associated with a unique class label k. Several hyperboxes can be associated with the same class label k.

With the notation RBjk we mean that the j-th hyperbox is associated with the class label k. Considered as a crisp set, each hyperbox can be fuzzified by associating with it a membership function. In the following we will consider the membership function proposed by Simpson [lO], in which the slope outside the hyperbox HBjk is established by the real and positive fuzziness parameter y, i.e.:

llik(X)= � 11(1_ f(X(lj-W,ik;y)-f(V'ik -X(lj;y)) (1) . D ;=0 . .

where f(z; y) is a soft-limiter function defined in R+ that

assumes the following values: 0 when yz is less than 0, 1 when yz is greater than 1 and yz otherwise. A Min-Max classification model is a feed-forward three­layer neural network. The fust layer is a dummy one, aiming only to supply the input features to each neuron of the second (hidden) layer. Each neuron of the hidden layer corresponds to a hyperbox and it computes the membership

Fig. 2. Simpson's Min-Max neural network

�o(x)

w c

T I---�

A

314

of the input pattern with respect to that hyperbox. The third layer is composed of one neuron for each class. Each neuron of the output layer determines the fuzzy membership value of the input pattern with respect to the corresponding class, by computing the fuzzy union of the outputs of all neurons in the hidden layer associated with the corresponding class k. When dealing with exclusive classification problems, the class corresponding to the maximum membership value is selected as the output class label (winner takes all strategy, WT A). Starting from a given SIr> a constructive learning algorithm for a Min-Max network must establish the number, position and size of each hyperbox. To this aim, we use the Adaptive Resolution Classifier (ARC) and Pruning Adaptive Resolution Classifier (PARC) learning algorithms. A detailed description of ARC/P ARC training procedure can be found in [8].

IV. SYSTEM IMPLEMENT A TlON

After verifying, through a suitable software implementation, that the proposed classification system is characterized by very encouraging performances, we have faced the computational load problems which could arise from managing high speed data links relaying on a dedicated hardware solution. To this aim we started by designing a standalone system able to classify TCP/IP flows in real time and to filter out undesired connections. The system has been conceived in order to obtain a satisfactory tradeoff between the maximum number of concurrent connections on an Ethernet link and the needed system working frequency, that is a typical scenario for the TCP/IP flow classifier (in the following, FC). We distinguish the operations that can be made offline from the ones that must be made in real-time. An external computer (PC) is dedicated to all the off-line operations, such as synthesizing the neurofuzzy classification model (ARC/P ARC procedures), configuring the FPGA and monitoring the whole system. The FC is in charge to classify in real-time a limited number of TCP/IP connections flowing in the Ethernet link between the Switch and the Router.

In Fig. 3 we show the conceptual scheme of the internal architecture of the system highlighting the main functional blocks. One microprocessor, executing a specific program, coordinates all the necessary operations to make the system work. Communication with the PC is performed by a dedicated peripheral (we adopted an U ART peripheral, but higher speed ones can be used as well). Through this communication channel the PC transfers the configuration parameters to the FC device and, at the same time, it can receive and process the classification results computed by the Fe. Through the two MAC/PRY blocks, packets flowing in both directions enter the system and reach the ANAL YZERILOOPER block in charge to sniff and forward them, in order not to alter the data flow on the Ethernet link. To this aim, packets received from a given MAC/PRY are sent unaltered to the device that they were originally destined (the Switch or the Router, depending on the considered direction) through the other MAC/PRY block. At the same time, packets are analyzed to extract information

Page 4: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

from their IP and TCP headers: flow identifiers, denoted with letter (a) in Fig. 3 and packet characteristics denoted with letter (b). In Fig. 4 we show the flow diagram of the algorithm implemented by ANAL YZERILOOPER, CAM and MEM blocks. Flow identifiers (a) are sent from the ANAL YZERILOOPER block to the CAM block (associative memory) that allows or prevents packet characteristics (b) to be stored inside the MEM block (memory). We define a packet as "valid" when it doesn't represent a SYN, SYN-ACK, ACK or FIN packet. When the prefixed number of valid packets belonging to the same connection has been reached, all the stored characteristics relative to that connection, denoted with (c) in Fig. 3, are sent to the CLASSIFIER block. This block consists of a neurofuzzy Min-Max model that computes the class, denoted with (d), associated to the input vector (c). Two distinct mechanisms let a location of CAM to be released. The first one consists in copying all the collected characteristics of valid packets corresponding to the same flow identifiers from the MEM block to the CLASSIFIER block. The second one occurs when a timeout after a valid packet reception of a connection is reached. Finally, the computed value (d) and its flow identifiers are sent both to the PC and to the BLOCKER CAM block. This block is constituted by an associative memory containing all the information of the connection to be blocked. It is in charge to open the bypass loop inside the ANAL YZERILOOPER block every time a new packet belonging to an undesired flow enters the system.

A. Analyzer/Looper Block ANALYZERILOOPER block is in charge to analyze the

packet flows coming from MAC blocks, extracting the flow identifiers and the packet characteristics. Packets are stored in FIFO blocks, depending on which MAC block they come from. Start of frame signal (sot) and TCP and IP headers are sent to a finite state machine FSM (simply based on header byte counters) to extract the flow identifiers and the packet characteristics. Sof signal is used as a write enable for data incoming a FIFO block and ENA GEN block receives the start of frame signal (always 0, but for one clock cycle) and returns the write enable signal that holds the 1 state of the sof signal for a number of clock cycles corresponding to the

H UART � ��t------�t -t�--�t�t-t�

�p

1 1 ------,---1

Fig. 3. The TCP/IP flow classifier block scheme

315

c:: X'(i), X'(i)-I. C X'(irl

Feature ex idcn

acket ch p R

X

� E p R

w I---.-.j T

0 C

Fig. 6. Flow idcnl ... � . ... �, ...

X'(i+3). X'(i+3}-1. X'(i+3)+1

Three-class CLASSIFIER block scheme

Fig. 4. Flow diagram of the algorithm implemented by ANAL YZERILOOPER, CAM and MEM blocks.

length of the packet which is associated with. Fig. 5 depicts the conceptual scheme representing this block.

B. Classifier Block In [17] we proposed an interesting implementation for the

Min-Max classification model targeted to an FPGA hardware device, searching for the best compromise between parallelization degree and hardware resources request, in terms of the number of LBs. The CLASSIFIER block conceptual implementation scheme is shown in Fig. 6, depicting the structure of a Min-Max neural network. This block receives the input feature vector X that has to be classified, performs all the operations defined in [17] and returns the class label C the vector X belongs to. Once the system is configured, the D samples of the input vector X enters the PREPROC block, element by element. Each element XU) is fust multiplied by the factor 1Iy to generate the X' (i) and then summed and subtracted to 1 obtaining X' (i)+ 1 and X' (i)-1 respectively. The three data are sent to three delaylines (Fig. 7), so that all the j-th hyperboxes (each one belonging to a different class k) receive the samples from the same taps of the delaylines. Precalculating X' (i)+ 1 and X(i)-I, and distributing them to all the hyperboxes allows to use just two RAMs (V '-RAM and W'-RAM), to store the divided-by-y coordinates {V'ljh W'ljk}, i = 1, . . . , D needed to define the hyperboxes themselves. Let Pljk represent the basic element of the new hyperbox membership function descending from the optimization of the architecture proposed in [17]. Each hyperbox HBjk calulates the sum of all the Pljk. The hyperbox selects the value of Pljk by calculating X (i)+ 1 > v '

Ijk, X (i) > V' Ijk, X (i) >

W' Ijk and X' (i)-1 > W' Ijk that can be easily obtained by manipulating (2). The four bit code that is obtained by the four "greater than" comparators is sent to a DECODER block that calculates the

A

c

Page 5: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

Pijk =

1,

V'ijk -X' (i), 0,

X' (i) - W'ijk ,

1,

0::; X' (i) < v'ijk-I

V'ijk -1::; X' (i) < V'ijk

V'ijk ::; X' (i) ::; W'ijk (2)

W'ijk < X' (i)::; W'ijk + 1

W'ijk + 1 < X' (i)::; 'Y

correct selector for the 4to 1 multiplexer. The pijk values come out the multiplexer at system clock rate and are accumulated by using a single adder with finite latency LADD. When considering an accumulator, the [mite latency LADD of the adder block allows to combine in the loop only data that are temporally distant LADD clock cycles, so once all the data to be summed enter the accumulation stage, LADD partial sums are calculated and they would loop without combining into a single value. In order to avoid this limitation in the accumulation stage, it has been provided with a dedicated simple finite state machine ACC FSM that properly delays (by using the tapped delay-line DL Y -LINE) the fed-back data in order to combine them. The CMPk blocks evaluate the membership value to a certain class by selecting the minimum value among all the values calculated by each hyperbox representative of class k. This is obtained thanks to a simple COUNTER that provides the incremental selection signal for the multiplexer to send all the data (one at a time) to the comparison stage with memory. The last block is the final comparator WTA that, based on the same working principles of CMPk block, produces the class label C that identifies which class the feature vector X belongs to.

V. FPGA SYSTEM IMPLEMENTATTON

As modem FPGAs allow to implement complete systems on a single device, this feature influenced our technological choice for the TCP/IP analyzer system implementation. We decided for a Cyclone IV E FPGA by Altera, available on a Terasic DE2-115 development board. This solution has been preferred to others after evaluating the best quality/price

-

--MACO

--

-

-

--MACl

--

-

packet 1 looped

sofO � � GEN GEN

packet 0 FIFOO

I 1 1 1 1 1 L _______ ;�k� � l��l __ 1 I 1 I �1� e�

_..J

pack ctcharactcrislics \oMEM

TCPi]Pheadcr

analyzer

(FSM) flow identifier \oCAM

active I 1 ------,

I

sof I

packet [

,-- - - - - - - - .LP-"-''':� ':"�t"--,

: --1 BLg;�ER I � � �l GEN GEN

FIFO I from

CLASSII' [ ER

packet 0 looped

Fig. 5. ANAL YZERILOOPER block implementation scheme

316

tradeoff that the market offers: the presence of all the components and the peripherals essential to develop a system prototype and the very low cost. Although integrated transceivers are not available on Cyclone IV E, on the board we can find two PHYs Marvell 88EIlli (each one with its magnetic) that realize the two ports from which our system receives the traffic data. The two PHY s are linked to the FPGA by 8 bit lanes at 125 MHz, that is the chosen value for the system clock frequency in our design. The system clock is generated by an FPGA internal PLL driven by a 50 MHz board oscillator. We decided to employ a softcore microprocessor system based on Nios II Altera macro IP (Intellectual Property). Consequently, the whole design can be thought as split into two distinct sections, namely in a custom logic design and in a Nios II system design. The Nios II system includes the microprocessor itself f..lP, the UART interface, the two MAC (implemented using the Triple Speed Ethernet Altera macro IP), the PHY control interfaces (based on MDIO protocol) and the SDRAM DDR2 controllers. Custom logic design comprises, instead, the pre-processing blocks (flow identifiers and packet characteristics extraction), the processing blocks (Min-Max classifier) and the CAM system for the flow identifiers storage. The implemented system can handle up to 512 flows with a classifier's complexity up to 100 hyperboxes, when dealing with a 4 classes problem, as in our case. To date the development of the system is in the process of completion, having carried out the RTL system description.

VI. CONCLUSION

Many interesting application concerning quality of

service and security issues in TCP/IP based communications

require to monitor and control traffic flows. To this aim it is

fundamental to develop real-time classification systems able

to cope reliably with fast data links. In this paper we

described a stand-alone FPGA based device able to

recognize and possibly filter out data flows generated by

undesired applications. The system, based on neurofuzzy

Min-Max classifiers, is characterized by very interesting

perfonnances by relying only on information extracted from

the headers of the very first packets of the flow to be

classified, avoiding the high computational cost required by

DPI techniques. For this reason, this approach is able to cope

also with TCP/IP flows with encrypted payloads, such as

SSH. The whole system is conceived to be placed in a single

Cyclone IV E FPGA by Altera, available on the Terasic

DE2-l15 development board.

Page 6: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

HBoo Xli) CMP,

roo

Cj •

1------+1 D (from hyperboxes HBjO)

Cz����� CZ����� Cz����j X'(i-2� X'(t'-2t+l l X'(i-2H 1 .. :::: .. (to hyperboxes HBQk)

� ......... f::::::::¥::: :::::::::::::: .. (to hyperboxes HB1k)

(from blocks CMP,) WTA

Fig. 7. Internal architecture of the highlighted blocks in Fig.6

REFERENCES

[I] A. Callado, C. Kamienski, G. Szabo, B. Gero, J. Kelner, S. Fernandes, and D. Sadok, "A Survey on Internet Traffic Identification", iEEE Comm. Surveys & Tutorials, vol. II, no. 3, pp. 37-52, Aug. 2009

[2] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, "Internet traffic classification demystified: myths, caveats and the best practices", Proc. of ACM SiGCOMM CoNEXT, Madrid, Spain, Dec. 2008.

[3] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, "BLINC: multilevel traffic classification in the dark", Proc. of ACM SIGCOMM, Philadelphia, USA, pp. 229-240, Aug. 2005.

[4] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, "Traffic classification through simple statistical fingerprinting", ACM

SIGCOMM Computer Comm. Rev., vol. 37, no. I, pp. 5-16, Jan. 2007. [5] A. W. Moore and D. Zuev, "Internet traffic classification using

Bayesian analysis techniques", Proc. of ACM SiGMETRlCS,

Karlsruhe, Germany, pp. 50-60, Aug. 2005. [6] L. Bernaille, R. Teixeira, and K. Salamatian, "Early application

identification", Proc. of ACM CoNEXT, Lisbon, Portugal, pp. 1-12, Dec. 2006.

[7] R. Alshammari and A. NurZincir-Heywood, "A flow based approach for SSH traffic detection", IEEE International Con! on Systems, Man and Cybernetics, Montreal, Canada, pp. 296 - 301, Oct. 2007.

[8] A. Rizzi, M. Panella, and F. M. Frattale Mascioli, "Adaptive resolution Min-Max classifiers", IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 402-414, Mar. 2002.

[9] N. Williams, S. Zandre, and G. Armitage, "A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow comparison", ACM SiGCOMM Computer Comm. Rev., vol. 36, no. 5, pp. 7-15, Oct. 2006.

[10] P. K. Simpson, "Fuzzy Min-Max neural networks - Part I: classification", IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 776-786,Sep. 1992.

[II] J. Xue, L. Sun, M. Liu, C. Qiao, and G. Ye, "Research on high-speed fuzzy reasoning with FPGA for fault diagnosis expert system", Proc. ofiCMA, Changchun, China, pp. 3005-3009, Aug. 2009.

[12] D. N. Oliveira, G. A. De Lima Henn, O. Da Mota Almeida, "Design and implementation of a Mamdani fuzzy inference system on an FPGA using YHDL", Proc. ofNAFiPS, Toronto, Canada, pp. 1-6, Jul. 2010.

[13] Wan-De Weng and Rui-Chang Lin, "An FPGA-based neural network digital channel equalizer", Proc. of international Con! on Machine Learning and Cybernetics, vol. 4, Hong Kong, pp. 1903-1908, Aug. 2007.

[14] A. Rizzi, N. M. Buccino, M. Panella, and A. Uncini "Genre classification of compressed audio data", Workshop on Multimedia Signal Processing, Cairns, Australia, pp. 654-659, Oct. 2008.

317

[15] A. Rizzi, F. M. Frattale Mascioli, F. Baldini, C. Mazzetti, and R. Bartnikas, "Genetic optimization of a PD diagnostic system for cable accessories", iEEE Trans. on Power Delivery, vol. 24, no. 3, pp. 1728-1738, Jul. 2009.

[16] G. Del Yescovo, M. Paschero, A. Rizzi, R. Di Salvo, and F. M. Frattale Mascioli, "Multi-fault diagnosis of rolling-element bearings in electric machines", XiX international Conference on Electrical Machines, Rome, Italy, pp. 1-6, Sept. 2010.

[17] A. Cinti and A. Rizzi, "Neurofuzzy Min-Max networks implementation on FPGA", Proc. of IJCCI (NCTA), Paris, France, pp. 51-57, Nov. 2011.

[18] A. Rizzi, S. Colabrese, A. Baiocchi, "Low complexity, high performance neuro-fuzzy system for Internet traffic flows early classification", iEEE Proc. of 4th international Workshop on TRAC, Cagliari, Italy, pp. 77-82, Jul. 2013.