cp performance optimization guide

24
©2009 Check Point Software Technologies Ltd. All rights reserved. 1 Classification: [Unrestricted]For everyone Performance Optimization Guide Table of Contents Preface ........................................................................................................................ 2 Open Performance Architecture Overview .................................................................. 2 SecureXL ................................................................................................................. 2 CoreXL .................................................................................................................... 2 ClusterXL ................................................................................................................. 3 Packet flows ............................................................................................................ 4 Optimizing Server Hardware and Operating System ................................................... 6 Hyper-Threading ...................................................................................................... 6 NIC Properties ......................................................................................................... 6 CPU Speed.............................................................................................................. 6 ARP Cache Table .................................................................................................... 7 Optimizing Network Performance ................................................................................ 8 Working with SecureXL ........................................................................................... 8 Working with CoreXL ............................................................................................. 12 Working with ClusterXL ......................................................................................... 16 Memory Allocation ................................................................................................. 16 SmartView Tracker Logs and dmesg Output ......................................................... 18 Optimizing the Session Rate ..................................................................................... 19 Working with SecureXL ......................................................................................... 19 Working with ClusterXL ......................................................................................... 22 Improving NAT Session Rate ................................................................................ 24 References ................................................................................................................ 24

Upload: fcar8072672

Post on 07-Apr-2015

1.025 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 1

Classification: [Unrestricted]—For everyone

Performance Optimization Guide

Table of Contents Preface ........................................................................................................................ 2

Open Performance Architecture Overview .................................................................. 2

SecureXL ................................................................................................................. 2

CoreXL .................................................................................................................... 2

ClusterXL ................................................................................................................. 3

Packet flows ............................................................................................................ 4

Optimizing Server Hardware and Operating System ................................................... 6

Hyper-Threading ...................................................................................................... 6

NIC Properties ......................................................................................................... 6

CPU Speed .............................................................................................................. 6

ARP Cache Table .................................................................................................... 7

Optimizing Network Performance ................................................................................ 8

Working with SecureXL ........................................................................................... 8

Working with CoreXL ............................................................................................. 12

Working with ClusterXL ......................................................................................... 16

Memory Allocation ................................................................................................. 16

SmartView Tracker Logs and dmesg Output ......................................................... 18

Optimizing the Session Rate ..................................................................................... 19

Working with SecureXL ......................................................................................... 19

Working with ClusterXL ......................................................................................... 22

Improving NAT Session Rate ................................................................................ 24

References ................................................................................................................ 24

Page 2: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 2

Classification: [Unrestricted]—For everyone

Preface This document describes how to optimize the performance of the Security Gateway for

version R70 and later versions. This document also provides an overview of some of the

Firewall technologies in order to provide a basic understanding of how to configure the

gateway parameters to best optimize network performance.

Open Performance Architecture Overview R70 Security Gateway includes the Open Performance Architecture which is a framework of

technologies designed to accelerate security performance. This framework includes:

SecureXL - Accelerates traffic using specialized hardware/software

CoreXL - Utilizes multiple cores

ClusterXL - Utilizes multiple machines for redundancy/Load Sharing

All three technologies can work together to maximize their unique advantages.

SecureXL

SecureXL is a technology that enables offloading security processing to processing units

(hardware or software). This allows fast processing of the traffic and enables high-speed

performance.

The firewall module handles the first packet of a connection and offloads the relevant

information to the SecureXL device. Thus the SecureXL device is allowed to process all the

subsequent packets. The firewall can also offload connection templates to the SecureXL

device. In this case, a new connection that matches the template can be created in the

device and the firewall does not even process the first packet. This feature is designed to

optimize performance for connections establishment rate.

Performance Pack is a SecureXL device implemented in software, which is designed to

benefit from multiple core CPU architecture.

CoreXL

CoreXL is a technology that allows Firewall and IPS security code to run on multiple

processors concurrently. The CoreXL layer accelerates traffic that cannot be handled by the

SecureXL device or traffic that requires deep packet inspection.

CoreXL is able to provide near linear scalability of performance, based on the number of

processing cores on a single machine. This increase in performance is achieved without

requiring any changes to management or network topology.

In a CoreXL gateway, the firewall kernel is replicated so that each replicated copy (instance)

runs on a processing core. These instances handle traffic concurrently, and each instance is

a complete and independent inspection kernel.

Page 3: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 3

Classification: [Unrestricted]—For everyone

ClusterXL

ClusterXL is a software based Load Sharing and High Availability solution that distributes

network traffic between clusters of redundant Security Gateways. It also provides

transparent failover between machines in a cluster.

A Security Gateway Cluster is a group of identical gateways that are connected, so that if

one fails, another immediately takes its place.

ClusterXL provides an infrastructure that ensures that no data is lost in case of a failover,

because each Gateway Cluster member is aware of the connections passing through the

other members via state synchronization.

ClusterXL Operation Modes

ClusterXL can be configured to operate in three different modes:

High Availability Mode

Load Sharing Multicast Mode

Load Sharing Unicast Mode

Each mode has its relative advantages and disadvantages.

High Availability Mode

When ClusterXL is set to High Availability mode, it designates one of the cluster members as

the active machine and the rest of the members are kept in a stand-by mode. All traffic is

directed to the active member. The active member updates the stand-by members of any

state changes, so that if the active member goes down, they can be immediately substituted

for it.

In this mode you only utilize the processing power of a single machine.

Load Sharing Mode:

When ClusterXL is set to Load Sharing mode, you can distribute network traffic between the

cluster members. Unlike High Availability mode, where only a single member is active at any

given time, in Load Sharing mode all the cluster members are active. The whole cluster is

responsible for assigning a portion of the traffic to each cluster member and this usually

leads to an increase in total throughput of the cluster.

Page 4: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 4

Classification: [Unrestricted]—For everyone

ClusterXL offers two separate Load Sharing solutions: Multicast mode and Unicast mode.

The difference between the two modes is how the members receive the packets sent to the

cluster.

Multicast mode - all packets sent to the cluster reach all the members in the cluster. Each

member then decides whether it should process the packets or not. This mode presents

better performance figures for connections establishment rate than Unicast mode.

Unicast mode - a single cluster member, referred to as the pivot, receives all the packets

sent to the cluster. The pivot is then responsible for propagating the packets to other cluster

members, creating a Load Sharing mechanism. The pivot member still acts as a firewall

module that processes packets. However, the other members can perform other tasks for

the pivot in order to reduce its total load and performance.

NOTE: To support ClusterXL Load Sharing Multicast, extra configuration settings may be

required on the connected router. For more information on ClusterXL Load Sharing Multicast

configuration mode, see the R70 ClusterXL Administration Guide.

Packet flows

When SecureXL is enabled, a packet enters the firewall and first reaches the SecureXL

device. The device can choose to handle the packet in three ways:

1. Acceleration path - The packet is completely handled by the SecureXL device. It is processed and sent back again to the network. This path does all the IPS processing when CoreXL is disabled.

2. Medium path - The packet is handled by the SecureXL device, except for IPS processing. The CoreXL layer passes the packet to one of the firewall instances, to perform IPS processing. This path is only available when CoreXL is enabled.

3. Firewall path - The SecureXL device is unable to process the packet. It is passed on to the CoreXL layer and then to one of the instances, for full firewall processing. This path also processes all packets when SecureXL is disabled.

Page 5: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 5

Classification: [Unrestricted]—For everyone

The following diagram displays the three different packet flows.

Queue

Firewall

Path Medium

Path

Instance 0

Queue

Firewall

Path Medium

Path

Instance 1

Queue

Firewall

Path Medium

Path

Instance 2

Queue

Medium

Path Firewall

Path

Instance 3

Medium

Path Firewall

Path

Instance N

Accelerated Path

Medium Path

Firewall Path

Dispatcher

Performance Pack

Queue

Page 6: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 6

Classification: [Unrestricted]—For everyone

Optimizing Server Hardware and Operating System The configuration of the server's hardware and operating system can affect the performance

of the R70 Security Gateway. When you use a server that is not configured properly, you are

diminishing network performance. Some of these configurations are only relevant for an

open server. The server should conform to the following configurations in order to optimize

performance.

If you are using a Check Point appliance, you only need to refer to the ARP Cache Table

section.

Hyper-Threading

Hyper-Threading can cause negative impact on performance of the R70 Security Gateway. It

is recommended that you disable this capability.

If you are using a Check Point appliance, Hyper-Threading is disabled by default.

NIC Properties

This configuration is only for an open server. There are four issues related to the NIC that

can affect performance of the R70 Security Gateway.

1. HCL support You should verify that you are using certified NICs with the following link: http://www.checkpoint.com/services/techsupport/hcl/index.html

2. PCI Express

You should use the PCI-Express NICs, because they have better performance than PCI-X NICs.

3. Speed

Use ethtool <interface name> to verify that the NIC is working at the desired

speed and using full-duplex settings.

4. Statistics

Use ethtool -s ethx to check statistics for the NICs. A properly working system

should display minimal rx/tx drop/error statistics.

CPU Speed

This configuration is only for an open server. If performance is low, use the cat

/proc/cpuinfo command to extract information about the CPU model and speed. You

may be able to improve performance if you upgrade the CPU frequency speed.

Page 7: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 7

Classification: [Unrestricted]—For everyone

ARP Cache Table

This configuration is relevant to a Check Point appliance and an open server. The default

limit of the kernel ARP Cache table is 1024 entries. You can increase the number of entries

to improve network performance. You should increase the ARP Cache table if the dmesg

command displays the message “Neighbour table overflow”.

NOTE: You should also increase the ARP Cache table if you are testing large subnets that

are directly connected to the gateway without a router.

To change the number of ARP entries:

The number of ARP entries is controlled by the net.ipv4.neigh.default.gc_thresh3

parameter. There are two ways to change the number of ARP entries:

Format the /etc/sysctl.conf file and run the sysctl –p command. This change

survives boot. (See Example 1.)

Run the sysctl command. This change does not survive boot. (See Example 2.)

The following examples demonstrate how to increase the number of ARP entries to 4096, to

allow for 4096 IPs.

Example 1

Modify the /etc/sysctl.conf file to include the line:

net.ipv4.neigh.default.gc_thresh3 = 4096

net.ipv4.neigh.default.gc_thresh2 = 2048

Run the sysctl -p command for the change to take effect.

Example 2

Run the command:

sysctl -w net.ipv4.neigh.default.gc_thresh3=4096

sysctl -w net.ipv4.neigh.default.gc_thresh2=2048

Page 8: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 8

Classification: [Unrestricted]—For everyone

Optimizing Network Performance This section discusses factors which affect network performance.

Working with SecureXL

This section discusses how SecureXL can have an impact on network performance.

Conditions that Preclude Accelerated Traffic

When SecureXL is enabled, all traffic should be accelerated. However, traffic that matches

the following conditions would not be accelerated:

Enabling some features can disable SecureXL altogether. For example:

o ClusterXL sticky decision function

o QoS

The first packet of any new TCP session, unless a template exists.

The first packet of any session that requires NAT.

The first packet of any new UDP session, unless a template exists.

All traffic that matches a service that uses a resource.

All traffic that is supposed to be dropped or rejected, according to the rule base (consider

enabling Drop Templates - see below).

All traffic whose source or destination is the gateway itself.

All traffic that matches a rule with user authentication or session authentication.

All traffic that requires anti virus or anti spam filtering.

Non-TCP/UDP/GRE/ESP traffic.

All multicast traffic.

All fragmented traffic.

All traffic with IP options.

RST packets, when the "Spoofed Reset Protection" feature is activated.

Traffic that is suspected to violate firewall protections, such as TCP sequence verification

(packets with abnormal sequences) or anti-spoofing (packets which come from an

unexpected interface).

Managing Non-Accelerated Traffic

Usually, the majority of network traffic should be accelerated when you are running

SecureXL. If you suspect that the majority of traffic is non-accelerated, you may need to

analyze SecureXL logs to identify the cause.

Page 9: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 9

Classification: [Unrestricted]—For everyone

There are two actions that you can perform:

1. Confirm that the majority of the traffic is non-accelerated.

2. Review and tune the firewall policy and IPS protections (refer to sk33250 and R70

IPS Administration Guide). .

Confirming Non-Accelerated Traffic

Use the fwaccel stats command to verify the amount of non-accelerated traffic

compared to accelerated traffic. In the following example there are 124 accelerated packets

and 766,058 packets that are non-accelerated.

# fwaccel stats Name Value Name Value -------------------- --------------- -------------------- --------------- conns created 480 conns deleted 471 temporary conns 0 templates 0 nat conns 0 accel packets 124 accel bytes 13360 F2F packets 766058 ESP enc pkts 0 ESP enc err 0 ESP dec pkts 0 ESP dec err 0 ESP other err 0 espudp enc pkts 0 espudp enc err 0 espudp dec pkts 0 espudp dec err 0 espudp other err 0 AH enc pkts 0 AH enc err 0 AH dec pkts 0 AH dec err 0 AH other err 0 memory used 0 free memory 0 acct update interval 3600 current total conns 8 TCP violations 0 conns from templates 0 TCP conns 4 delayed TCP conns 0 non TCP conns 4 delayed nonTCP conns 0 F2F conns 8 F2F bytes 48076865 crypt conns 0 enc bytes 0 dec bytes 0

Name (Statistic Parameter) Explanation

accel packets Number of accelerated packets

accel bytes Number of accelerated traffic bytes

F2F packets Number of packets handled by the Security

Gateway in slow-path

conns from templates Number of connections created from templates

F2F bytes Number of traffic bytes handled by the Security

Gateway in the firewall path

Page 10: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 10

Classification: [Unrestricted]—For everyone

TIP: You can use the following commands to enable debugging in SecureXL and

Performance Pack in order to understand and identify causes for non-accelerated traffic.

Command Explanation

fw ctl debug –buf 32000 Set debug buffer

fwaccel dbg + offload Debug SecureXL offload mechanism

sim dbg + f2f Debug Performance Pack forward to firewall

incidents

fw ctl kdebug –T -f >

debug.txt&

Forward debug output to a file

NOTE: Enabling debug might have a negative impact on performance.

To disable debug:

Run the sim dbg resetall and fw ctl debug 0 commands.

Disabling Performance Pack

If the majority of traffic cannot be accelerated, disabling the Performance Pack might

improve performance.

To disable Performance Pack:

Run the cpconfig command.

An interactive menu is displayed and provides you with the option to enable or disable

the accelerated traffic by selecting Enable/Disable Check Point SecureXL. Select

Enable in order to enable accelerated traffic. Select Disable in order to disable

accelerated traffic.

IPS Protections

Some protections can cause an adverse affect on the performance of the gateways on which

they are activated. These protections must use more resources, or they apply to common

types of traffic.

Protections with a critical performance impact normally prevent SecureXL from

accelerating the traffic and can significantly reduce network performance.

Protections with a high performance impact may also reduce network performance.

Page 11: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 11

Classification: [Unrestricted]—For everyone

Protections that have a critical or high performance impact should only be activated when

there is a critical or high severity, or they are specifically needed. If your gateways

experience heavy traffic load, be careful when activating high/critical performance impact

protections on profiles that affect a large number of mixed (client and server) machines.

IPS Exceptions

For protections which prevent SecureXL from accelerating traffic, the IPS exception

mechanism allows SecureXL to accelerate connections that match the exception rules.

For example:

“Network Quota” protection in R70 does not disable SecureXL templates on connections

that match the protection's exception rules.

IP ID Masking, and TTL Masking (Fingerprint Scrambling) protections do not disable

templates and acceleration on connections that match these protections' exception rules

For further information regarding IPS, refer to the R70 IPS Administration Guide.

Dropped Templates

You should enable drop templates to improve the Security Gateways’ performance when a

large part of the traffic matches a drop rule. This feature allows Performance Pack to handle

the drops. This feature is disabled by default.

To enable drop templates:

1. Open Policy>Global Properties from the SmartDashboard.

2. Select the SmartDashboard Customization window and click Configure.

3. Select Firewall-1>SecureXL.

4. Check enable_drop_templates.

The following table contains CLI commands that can help you manage drop templates:

Command Result

fwaccel stat To check the status of drop templates

fwaccel templates –d To view current dropped templates

fwaccel stats –d To get statistics about dropped templates

sim ranges –a To view the Security Gateway's rule base ranges

(output goes to /var/log/messages)

Drop templates (fwaccel stats –d) contains an index of ranges. If you correlate the

index with sim ranges, then you can better understand the practical ranges for drop

templates and when it is appropriate to use them.

Page 12: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 12

Classification: [Unrestricted]—For everyone

Working with CoreXL

This section discusses how CoreXL can have an impact on network performance.

CPU Roles

The cores in a multi-core machine can assume several roles, including:

Secure Network Dispatcher (SND)

Kernel Instance

Daemon

Secure Network Dispatcher (SND)

This role is responsible for:

Processing incoming traffic from the network interfaces.

If Performance Pack is running - processing packets which can be accelerated

(acceleration path).

Distributing non-accelerated packets among kernel instances for IPS and Firewall

inspection.

Traffic entering network interface cards (NICs) is directed to a processing core running the

SND. The association of a particular interface with a processing core is called the interface’s

affinity with that core. This affinity causes the interface’s traffic to be directed to that core and

then SND runs on that core.

Kernel instance

A firewall kernel instance is configured to run on a particular core which is responsible for the

following:

Firewall processing (firewall path)

IPS processing (medium path)

Traffic which is not accelerated by Performance Pack is forwarded to one of the instances

for further processing.

Page 13: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 13

Classification: [Unrestricted]—For everyone

Daemon

The firewall daemon (fwd) and other daemons can be configured to run on a dedicated

core.

Regarding the firewall daemon, this can be useful when there is massive logging that

consumes a lot of CPU resources.

IMPORTANT: Under normal circumstances, it is not recommended for the SND and an

instance to share a core. However, it is necessary in the following cases:

1. When using a machine with only two cores. It is better for both SND and instances

to share cores, instead of giving each only one core.

2. When you know that almost all of the packets are being processed in the

accelerated path, and you want to assign all CPUs to this path. If the instances do

not receive significant work, then it is appropriate to share the cores.

Balancing Core Utilization

In many cases, the CPU can be overloaded and can create a performance bottleneck. You

should balance the CPU usage between the cores to optimize performance.

Optimizing Core Utilization

In some cases, you should change the default configuration and divide the cores between

kernel instances and SND for optimal performance.

The following table describes the default configuration of cores and kernel instances:

Number of Cores Number of Kernel Instances

1 CoreXL is disabled

2 2

4 3

8 6

For more information on configuring the cores, refer to the CP R70 Firewall Administration

Guide.

To optimize core utilization:

1. Use the fw ctl affinity -l -r command to understand the role of each CPU.

You can view the cores that are handling kernel instances.

2. Cores that do not have a kernel instance running are for SND to use. The interfaces'

affinity should only be mapped to these cores.

Page 14: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 14

Classification: [Unrestricted]—For everyone

3. Run the top command to see which cores are heavily utilized.

a. If SND cores are more heavily used than instance cores - you may want to

decrease the number of instances, to allow SND to use another core.

b. If instance cores are more heavily used than SND cores - you may want to

increase the number of instances, to share the work among more instances.

To increase or decrease the number of instances, use the CoreXL

configuration menu in cpconfig.

NOTE: After the top command is entered, you need to press 1 to view usage per CPU. To

make this the default view, select SHIFT+W.

Distributing Interfaces to the Cores

You should distribute the interfaces affinity equally between the cores which are available for

SND processing. The default configuration is:

If Performance Pack is enabled - interface affinity is handled in automatic mode. In this

mode, Performance Pack determines affinity based on the load. You may want to switch to

manual mode and set interface affinity yourself, and possibly improve the performance.

If Performance Pack is disabled- all interfaces' affinity are mapped to a single core. If you

have more than one core available, you should change the affinity of some interfaces to use

the other cores.

To distribute the interfaces:

1. Run the top command to display how the SND cores are being used.

2. If the cores are unbalanced, you should distribute the interfaces.

o If Performance Pack is enabled - run the sim affinity -s command to

use static affinity to balance the interfaces between the SND cores.

o If Performance Pack is disabled – run the fw ctl affinity -s command

to use static affinity to balance the interfaces between the SND cores.

Working with Cores

Here are some important tips to remember when you are working with cores.

You should map heavily used interfaces' affinity to separate cores.

If Performance Pack is enabled and you have a pair of interfaces that serve the same

connections, then you should map the interfaces' affinity to the same core. In most cases,

Performance Pack’s automatic affinity provides the optimal utilization. If this is not the case,

it is recommended performance-wise to manually set the affinitiy of interfaces using the sim

affinity –s command.

For more information, refer to the “sim affinity” section in the R70 Performance Pack

Administration Guide.

Page 15: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 15

Classification: [Unrestricted]—For everyone

Additional performance tips can be found in sk33250.

Allocating a Core for Heavy Logging

If the gateway is performing heavy logging, it may be advisable to allocate a processing core

to the fwd daemon, which performs the logging. Just as adding a core for the SND, this too

also reduces the number of cores available for kernel instances.

To allocate a processing core to the fwd daemon:

1. Reduce the number of kernel instances using cpconfig.

2. Set the fwd daemon affinity, as detailed below.

Setting the fwd Daemon Affinity

Check which processing cores are running the kernel instances and which cores are

handling interface traffic with the fw ctl affinity -l –r command. Set the fwd

daemon affinity to the remaining core in order to allocate it to the fwd daemon.

NOTE: If interface affinities are attached to a specific core, then you should avoid setting the

affinity of the fwd daemon to these cores. In general, it is recommended to attach a core

with only one of the following components: network interfaces, kernel firewall instances or

user space processes/daemons. You should avoid having more than one these components

attached to the same core.

When you set affinities for Check Point daemons (such as the fwd daemon), they are loaded

at boot from the fwaffinity.conf configuration text file located at: $FWDIR/conf.

Edit the file by adding the following line:

n fwd <cpuid>

where <cpuid> is the number of the processing core to be set as the affinity of the

fwd daemon.

For example, to set core #2 as the affinity of the fwd daemon, add to the file:

n fwd 2

You must reboot the server in order that the fwaffinity.conf settings take effect.

After reboot, you can verify the configuration by running the command: fw ctl

affinity -l -r.

Here is an example of the output:

# fw ctl affinity -l -r

CPU 0: Mgmt Lan1 Lan2

CPU 1: Lan3 Lan4 CPU 2: fwd CPU 3: fw_4

CPU 4: fw_3 CPU 5: fw_2

Page 16: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 16

Classification: [Unrestricted]—For everyone

CPU 6: fw_1 CPU 7: fw_0

All: cprid cpd

VPN and VoIP Traffic

With CoreXL, VPN tunnel establishment and VoIP control connection are processed in

firewall instance 0. This means that CoreXL does not provide scalability for these scenarios.

If Performance Pack is enabled, then the VPN traffic and VoIP data connections are

accelerated by the Performance Pack and pass through the acceleration path to achieve low

latency and high performance.

Firewall and IPS Inspection

When you are running CoreXL, optimal performance is achieved when the connections are

load balanced across the instances and all the cores are working in parallel. See the section,

Balancing Core Utilization for more information.

In lab staging tests (when running with CoreXL) you should use many source and/or

destination IPs. Usually, several hundred distinct IP pairs should be sufficient to balance the

connections amongst the kernel instances. Do not use an extremely high number of IPs,

because this may make the templates ineffective.

Working with ClusterXL

This section discusses how ClusterXL can have an impact on network performance.

Static NAT with SmartDefense Protections

Using Static NAT with SmartDefense protections can result in circumstances where

asymmetric routing between the cluster members has a negative impact on network

performance. Asymmetric routing or a non-sticky connection is where one member in a Load

Sharing configuration handles one direction of the connection and a different member

handles the second direction.

Some of the SmartDefense protections require the connection to be sticky - the packet must

be handled by the same cluster member. Network performance can be reduced when a

sticky connection is combined with asymmetric routing. For example:

Flush and ACK - The return packet for this connection is not going to be handled by the

original cluster member. The original member holds the packet until it is synchronized and

acknowledged by the other member.

Forwarding - A cluster member forwards packets to the member that handled the first

packet of the connection.

Memory Allocation

Memory allocation failures can reduce the performance of the system.

NOTE: If a memory allocation failure occurs, you should not perform lab tests for achieving

best performance. For example, do not perform a lab test if there are too many concurrent

connections.

Page 17: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 17

Classification: [Unrestricted]—For everyone

To view if memory allocations have failed:

1. Run the fw ctl pstat command.

2. Search for failures in kmem and smem. (These values are bolded in the following

example.)

This is an example of a sample output of memory allocations:

Machine Capacity Summary: Memory used: 20% (165MB out of 823MB) - below low watermark Concurrent Connections: 0% (25 out of 999900) - below low watermark Aggressive Aging is not active Hash kernel memory (hmem) statistics: Total memory allocated: 257949696 bytes in 62914 4KB blocks using 62 pools Initial memory allocated: 20971520 bytes (Hash memory extended by 236978176 bytes) Memory allocation limit: 862978048 bytes using 512 pools Total memory bytes used: 5977548 unused: 251972148 (97.68%) peak: 72292464 Total memory blocks used: 1726 unused: 61188 (97%) peak: 17803 Allocations: 95277797 alloc, 0 failed alloc, 95201809 free System kernel memory (smem) statistics: Total memory bytes used: 392118672 peak: 420534724 Blocking memory bytes used: 422932 peak: 958412 Non-Blocking memory bytes used: 391695740 peak: 419576312

Allocations: 5894755 alloc, 0 failed alloc, 5893180 free, 0 failed

free Kernel memory (kmem) statistics: Total memory bytes used: 139844652 peak: 204210436

Allocations: 101171165 alloc, 0 failed alloc, 101094365 free,

0 failed free External Allocations: 0 for packets, 2660 for SXL

Note: Even though failures in hmem are legitimate, they might impact performance especially

when CoreXL is enabled. For optimal performance, there should not be any failed memory allocations.

Resolving memory problems

Here are some possible solutions to memory allocation problems:

On open servers, you can install more memory. However, the maximum amount of

memory that can be used by the kernel is 2 GB.

You can decrease the TCP end timeout.

You can decrease the number of concurrent connections to reduce memory consumption.

Page 18: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 18

Classification: [Unrestricted]—For everyone

SmartView Tracker Logs and dmesg Output

You can use SmartView Tracker logs and dmesg output to help you detect problematic

events that can impede network performance. You may encounter one or more of the

following events: cluster failovers, cluster overload synchronization, memory problems, and

dropped packets.

Sample SmartView Tracker Logs

The following SmartView Tracker logs are examples of events that can impede network

performance:

- member [ID] ([IP]) <is active|is down|is stand-by|is initializing> ([REASON]).

This message is issued whenever a cluster member changes its state. The log text

specifies the new state of the member.

- [DEVICE] on member [ID] ([IP]) detected a problem ([REASON]).

Either an error was detected by the pnote device, or the device has not reported its state

for a number of seconds (as set by the “timeout” option of the pnote)

- interface [INTERFACE NAME] of member [ID] ([IP]) is down (receive <up|down>,

transmit <up|down>).

This message is issued whenever an interface encounters a problem, either in receiving

or transmitting packets. Note that in this case the interface may still be working properly,

as far as the OS is concerned, but is unable to communicate with other cluster members

due to a faulty cluster configuration.

Sample dmesg Log

The following dmesg log is an example of an event that can impede network performance:

FW-1: State synchronization is in risk. Please examine your synchronization network to

avoid further problems!

For more information on the dmesg log see the R70 ClusterXL Administration Guide.

Page 19: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 19

Classification: [Unrestricted]—For everyone

Optimizing the Session Rate This section discusses factors which affect session rate and can have an impact on

performance.

Working with SecureXL

This section discusses how SecureXL can have an impact on session rate.

Concurrent Connections

You should ensure that the total number of concurrent connections is appropriate to the TCP

end timeout. Too many concurrent connections can impede the performance of the R70

Security Gateway.

You can calculate the maximum number of concurrent connections by multiplying the

session establishment rate by the TCP end timeout (by default, 20 seconds).

NOTE: To test session rate many connections need to be opened. You must ensure that the

test is not limited by the maximum number of connections in order for the test to be valid.

To compare the number of concurrent connections with maximum limit of

connections:

1. Use the fw tab -t connections command to display the maximum limit of the

connections table.

For example:

[Expert@cpmodule]# fw tab -t connections localhost: -------- connections -------- dynamic, id 8158, attributes: keep, sync, aggressive aging, kbuf 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31, expires 25, refresh,

limit 1000000, hashsize 1048576, free function c2f372c0 0, post sync handler c2f2b230

2. Use the fw tab -t connections -s command to find out the concurrent

number of entries in the connections table. For example:

[Expert@cpmodule]# fw tab -t connections -s HOST NAME ID #VALS #PEAK #SLINKS

localhost connections 8158 26 244914 40

Page 20: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 20

Classification: [Unrestricted]—For everyone

3. If the peak number of connections has reached the limit, you must perform one of the following actions:

o Reduce the TCP end timeout.

a) From SmartDashboard, select Policy>Global Properties. The Global

Properties window opens.

b) Select Stateful Inspection.

c) Decrease the number in the TCP end timeout: field.

o Increase the maximum concurrent connections.

a) From SmartDashboard, double click on the gateway object. The

Check Point Gateway window opens.

b) Select Capacity Optimization.

c) Increase the number in the Maximum concurrent connections: field.

NOTE: When Aggressive Aging is enabled and the number of concurrent connections is

near the limit, there can be a performance impact.

Aggressive Aging

Aggressive Aging is triggered when memory consumption is high, and the R70 Security

Gateway deletes some connections to reduce consumption. It destroys old connections,

particularly closed TCP sessions, which were closed at least 3 seconds ago. Aggressive

Aging reduces the number of concurrent connections to prevent memory exhaustion.

However, when Aggressive Aging starts deleting connections, there is a noticeable

performance impact.

NOTE: Aggressive Aging can invalidate a performance test. For best results, you should

ensure that Aggressive Aging is not active during the test. You should disable it, or run the

fw ctl pstat command to make sure that less than 70% of the machine's memory is

used by the test. For more information on machine memory, refer to the Memory Allocation

section.

Templates

In order to accelerate connection establishment, there is a mechanism that attempts to

"group together" all connections that match a specific service but have a different source

port. When the first packet of the first connection in such a group is seen, it is processed by

the firewall, which offloads the connection to the SecureXL device. The firewall also offloads

a “template”, which allows the device to accelerate all other connections in this group. When

the first packet of another connection in this group arrives, the acceleration device can

handle it by itself. This "grouping" allows the acceleration device to handle almost all

packets, including even the first packet of most connections.

Page 21: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 21

Classification: [Unrestricted]—For everyone

To verify that templates are being created:

Run the fwaccel stat command.

Here is a sample output of the fwaccel stat command. The second line has been

bolded to indicate that templates are being created.

Accelerator Status : on Accept Templates : enabled Drop Templates : disabled Accelerator Features : Accounting, NAT, Cryptography, Routing, HasClock, Templates, Synchronous, IdleDetection, Sequencing, TcpStateDetect, AutoExpire, DelayedNotif, TcpStateDetectV2, CPLS, WireMode, DropTemplates, Streaming, MultiFW, AntiSpoofing, DoS Defender Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL, 3DES, DES, CAST, CAST-40, AES-128, AES-256, ESP, LinkSelection, DynamicVPN, NatTraversal, EncRouting

If templates are not being created, then there is a rule that is preventing a template from

being created. Refer to the section, Using Templates with Rules for more information.

Conditions that Prevent Using Templates

There are several conditions that can prevent a template from being created or from being

effective:

The connections cannot be grouped because the source port is not the only variation. A

template is not created for these connections and the first packet is handled by the firewall

path.

Traffic which requires NAT does not use a template.

VPN traffic does not use a template.

Complex connections (FTP, H323, etc.) do not use a template.

Non-TCP/UDP traffic does not use a template.

Using Templates with Rules

Some rules in the SmartDashboard can prevent a template from being created. All traffic

which matches this rule is affected, as well as any rule below it. In SmartDashboard, you

should place all rules that can use a template at the top of the rule base (unless this violates

other considerations). After you have changed the rule base, SecureXL automatically

creates new templates for grouped connections.

Page 22: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 22

Classification: [Unrestricted]—For everyone

Here are rules that can prevent a template from being created:

Rules with the following objects:

o Time object

o Port range object

o Dynamic object

Rule with a service that has a handler (protocol type) enabled.

Rules with "complex" services. (i.e. Services that have anything specified in the Match

field, or Enable reply from any port of their Advanced section)

Rules with RPC/DCOM/DCE-RPC services.

Rules with client authentication or session authentication.

When SYN Defender or Small PMTU features are activated.

Delayed Notification

A SecureXL device may create a connection that matches a template, and notify the firewall

about the connection only after a period of time. This feature further enhances the

connection rate of the SecureXL device.

The fwaccel stats command indicates the total number of delayed connections

(delayed TCP conns.)

Refer to the section, Managing Non-Accelerated Traffic for more information.

The fwaccel templates command indicates the delayed time for each template under

the DLY entry.

If you are using a single gateway device – Delayed Notification is enabled by default.

If you are using a ClusterXL gateway – Delayed Notification is disabled by default.

Working with ClusterXL

This section discusses how ClusterXL can have an impact on session rate.

State Synchronization

State Synchronization enables all machines in the cluster to be aware of the connections

passing through each of the other machines. It ensures that if there is a failure in a cluster

member, connections that were handled by the failed machine are maintained by the other

machines. However, State Synchronization has some performance cost and occasionally

under heavy load, sync packets could even be lost.

Page 23: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 23

Classification: [Unrestricted]—For everyone

If you receive the following error messages when running dmesg, then there may be

connectivity problems.

"FW-1: State synchronization is in risk. Please examine your

synchronization network to avoid further problems !”

These problems are more likely to occur in load sharing configurations and after failover.

Sync at Risk

A sync at risk condition occurs when a cluster member is not able to send delta syncs to

another cluster member at the required rate. When this happens, the sending member has

to throw away unacknowledged delta syncs, and the receiving member might therefore

receive partial (inconsistent) information.

A sync at risk condition might result in connectivity problems.

These problems generally do not occur in High Availability configurations. However, there

may be a problem after failover.

Connectivity problems are more critical in Load Sharing configurations and especially in

asymmetric routing configurations. Even when there is no asymmetric routing, “global”

information (not per-connection) can be lost and cause connectivity issues.

Resolving a Sync at Risk Condition

You can resolve a sync at risk condition and decide not to synchronize a service if ALL of the

following conditions are true:

1. A significant portion of the traffic crossing the cluster uses a particular service. If you

do not synchronize this service, then the amount of synchronization traffic is reduced

and cluster performance is enhanced.

2. The service usually opens short connections, whose loss may not be noticed. DNS

(over UDP) and HTTP are typically responsible for most connections, and generally

have very short life and inherent recoverability at the application level. However,

services which typically open long connections, such as FTP, should always be

synchronized.

3. Configurations that ensure bi-directional stickiness for all connections do not require

synchronization to operate (only to maintain High Availability). Such configurations

include:

o Any cluster in High Availability mode (for example, ClusterXL New HA or

Nokia VRRP.)

o ClusterXL in a Load Sharing mode with clear connections (no VPN or static

NAT.)

o OPSEC clusters that guarantee full stickiness (refer to the OPSEC cluster's

documentation.)

Page 24: CP Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 24

Classification: [Unrestricted]—For everyone

Delayed Synchronization and ClusterXL

In a ClusterXL configuration, the SecureXL Delayed Synchronization feature is disabled by

default. You may want to enable Delayed Synchronization to improve session rate.

When a connection is being delayed, the other cluster members are not immediately notified.

Thus, this connection is not synchronized to the other members. Delayed Synchronization

can significantly reduce the amount of synchronization traffic and improve performance.

However, if there is a failover, these connections would be terminated and connectivity

would be lost. You should consider the relative advantages and disadvantages of enabling

Delayed Synchronization.

To enable Delayed Synchronization from SmartDashboard:

1. From the Service tab, double-click on the desired service. The Service Properties

window opens.

2. Click Advanced…. The Advanced Service Properties window opens.

3. Select the Start Synchronizing checkbox.

4. Click OK.

Improving NAT Session Rate

You can disable SecureXL to improve the NAT session rate.

To improve NAT session rate:

1. Disable SecureXL. However, this also significantly lowers the performance of the

overall packet rate, throughput and IPS performance.

2. Do one of the following:

Decrease TCP end timeout to 2 seconds.

Refer to the Concurrent Connections section, for more information on

decreasing TCP end timeout.

Or

Increase the dispatcher connection table hash size by editing

$FWDIR\modules\fwkern.conf with fwmultik_gconn_tab_hsize=

8388608 and rebooting the machine. However, this change reduces the capacity

of the maximum number of concurrent connections.

References CP R70 Firewall Administration Guide

CP R70 PerformancePack Administration Guide

CP R70 ClusterXL Administration Guide

CP R70 IPS Administration Guide