authors: danhua guo 、 guangdeng liao 、 laxmi n. bhuyan 、 bin liu 、 jianxun jason ding conf....

15
Authors: Danhua Guo Guangdeng Liao Laxmi N. Bhuyan Bin Liu Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '08) Presenter : JHAO-YAN JIAN Date : 2010/11/10 A scalable multithreaded L7-filter design for multi- core servers 1

Upload: toby-may

Post on 18-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、Bin Liu 、 Jianxun Jason DingConf. : The 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '08)Presenter : JHAO-YAN JIANDate : 2010/11/10

A scalable multithreaded L7-filter design for multi-core

servers

1

Page 2: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

IntroductionTraditional packet classifications make the decision

based on packet header information. But many applications, such as P2P and HTTP, hide their application characteristics in the payload.

The original L7-filter is a sequential DPI(Deep packet Inspection) program that identifies protocol information in a given connection.

Traditional single core server is insufficient to satisfy DPI functionality. (high speed networks, such as 10 Gigabit Ethernet)

In spite of its enhanced processing power, efficient core utilization in a multi-core architecture remains a challenge.

2

Page 3: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Introduction

3

Network traffic in original L7-filter is captured by Netfilter, which consists of a set of hooks inside the Linux kernel that allows kernel modules to traverse the network stack.

Inside the network stack of the kernel, a series of operations are executed to establish a connection buffer based on 5-tuple connection information in the packet header. Operations : TCP/IP packets checksum verification,

TCP/IP reassembling, IP refragmentation, etc .After such a preprocessing stage. L7-filter starts to

match all the application layer data of the arriving packets in the same connection against the protocol database in a sequential fashion.

Page 4: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Decoupling Linux L7-filter operations

4

Previous research from both academia and industry have demonstrated that the performance of L7-filter is bounded by the cost of pattern matching.

Therefore, the authors have developed a decoupled model to separate the packet arrival handling and focus on optimizing the pattern matching operations at the application layer.

To parallelize the L7-filter operations based on a user space version.

Page 5: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Modeling Single-Threaded L7-filter

5

choose libnids as a user space module.Libnids reads tcpdump trace files and

simulates kernel network stack behaviors in user space.

Libnids offers IP defragmentation, TCP stream assembly and TCP port scan detection.

The original online L7-filter is substituted by a combination of a Preprocessing Thread(P T) and a Matching Thread(M T) .

At any point of processing, a connection can only have one of the three statuses: 1 ) MATCHED or 2) NO_MATCH 3) NO_MATCH_YET.

Page 6: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Modeling Single-Threaded L7-filter

6

4 2 13

11+2

Page 7: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Parallelizing L7-filter at Connection Level

7

Once more MTs are created, each MT executes on a connection buffer basis. When a new packet is reassembled for a connection, randomly selecting a non-empty runqueue of a thread introduces additional cache over head by copying packets of the same connection to different cores.

In addition, it also wastes the thread resources. we believe dispatching an independent

thread to a dedicated core saves the cost of scheduling overhead and reduces cache misses introduced by live migrations of unbalanced work loads.

Page 8: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Parallelizing L7-filter at Connection Level

8

3 1 24

3

Page 9: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Parallelizing L7-filter at Connection Level

9

3 1 24

3

Page 10: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Parallelizing L7-filter at Connection Level

10

Page 11: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Experiment Platform

11

This server system has two CPU sockets, each embeds a quad-core Xeon X5355 2.66GHz processors, and 16GB of 667MHz DDR2 SDRAM. Each socket has two 4MB shared L2 caches.

To Use Linux kernel 2.6.18 as default OS.

Page 12: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Throughput and Core Utilization

12

With 7 concurrent threads, the system throughput increases by 51% compared to the naive OS scheduling. The system scales near linearly ( a speedup of 6.5X when 7 threads are applied.) to the number of MTs.

Page 13: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

Cache Performance

13

Page 14: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

A Life-of-Packet Analysis

14

Page 15: Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking

15