architectural impact of stateful networking applications javier verdú, jorge garcía mario...
TRANSCRIPT
Architectural Impact ofArchitectural Impact ofStateful Networking ApplicationsStateful Networking Applications
Javier Verdú, Jorge GarcíaMario Nemirovsky, Mateo Valero
The 1st Symposium on Architectures for
Networking and Communications Systems
Princeton, New Jersey, USA October 26-28, 2005
ANCS - I
Architectural Impact of Stateful Networking Applications 2
Trends of Internet Important growth of Internet Traffic
Consequent Traffic Aggregation increment• Low packet/flow temporal locality
End-point routers & appliances execute stateful apps Upper layer packet processing
• Larger workloads per packet
Facing new security issues Improvement of attacks methods
• Need to spread the knowledge futher than a packet
Architectural Impact of Stateful Networking Applications 3
Granularity Levels
…
Holding
Company
Department
User
Application
Flow
Packet
Stateful Application Model
ApplicationApplication
-
+
State Lifetime
PacketPacket FlowFlow
UserUser
CompanyCompany
DepartmentDepartment
Architectural Impact of Stateful Networking Applications 4
Research Limitations on Stateful Apps Pool of Benchmark Suites for Network Processors
CommBench NetBench NpBench NPForum
Lack of Stateful Benchmarks Most of them are stateless benchmarks
Creating new benchmarks Reliability???
• State size• State management
Architectural Impact of Stateful Networking Applications 5
Talk Outline Introduction
Network Traffic Properties
Description of Environment
Architectural Impact Analysis
Summary
Architectural Impact of Stateful Networking Applications 6
Network Traffic Properties Traffic Aggregation Level
Unique Flow rate in a given window
vs
Architectural Impact of Stateful Networking Applications 7
Network Traffic Properties Traffic Aggregation Level
Unique Flow rate in a given window
Intra-Flow Temporal Distribution How the packets are exchanged?
vs
Architectural Impact of Stateful Networking Applications 8
Network Traffic Properties Traffic Aggregation Level
Unique Flow rate in a given window
Intra-Flow Temporal Distribution How the packets are exchanged?
Inter-Flow Temporal Distribution Packet rate between packets of the same flow
vs
vs
Architectural Impact of Stateful Networking Applications 9
Snort is tuned with four different configurations Stream4
• Prevents Stick/Snot attacks Flow-Portscan
• Detects portscanning attacks SfPortscan
• Detects a variety of portscanning attacks Merged Engines
• The combination of the above engines
Argus is a monitoring/billing benchmark Currently it is included in NO benchmark suite Open source application
• http://www.qosient.com Equivalent to the commercial tool Cisco NetFlow
Benchmark Selection (I)
Architectural Impact of Stateful Networking Applications 10
Obviously, stateless applications keep no flowstate The state size may vary a lot between applications
The state management also may be quite different
Benchmark Selection (& II)
0
500
1000
1500
2000
2500
3000
MergedEngines
Stream4 FlowPortscan
SFPortscan
Argus AnyStateless
App
Benchmark
Flo
w S
tate
(B
yte
s)
Architectural Impact of Stateful Networking Applications 11
Evaluation Methodology Instrumented Binary Code: ATOM
Trace-driven simulation: Modified version of SMTSim Simulator
Simulation length Warming period
• 10K Packets Processing period
• 50K Packets Packet selection for the flow lifetime studies
Towards analysis of actual application behavior The baseline is an ample configuration
• ROB Size 256 entries– No significant improvements with larger ROBs
• Physical Regs: 192 int, 192 FP– No stress due to lack of regs
• Perceptron Branch Predictor– The most powerful configuration
• 64KB I$, 64KB DL1$, 2MB L2$– No significant improvements with larger caches
Architectural Impact of Stateful Networking Applications 12
Architectural Impact Analysis
Computational complexity
Available Parallelism
Impact of Bottlenecks
Branch Prediction
Data Cache Behavior
Architectural Impact of Stateful Networking Applications 13
Computational Complexity (I)
There are no significant differences among benchmarks Roughly 35% - 45% of memory accesses
Argus is more memory intesive
0
1000
2000
3000
4000
5000
6000
7000
8000
Merged Engines Stream4 Flow-Portscan SfPortscan Argus
Benchmark
Inst
ruct
ion
s p
er P
acke
t
Integer Computation Uncond. Branch
Cond. Branch Load
Store
Architectural Impact of Stateful Networking Applications 14
Computational Complexity (& II)
The instruction mix is similar along all the packets Some applications generate the hardest workload in the first
packets Other applications show almost constant workload
0
2000
4000
6000
8000
10000
12000
1 2 3 … … n-3 n-2 n-1 n
Flow Live (N-th Pkt)
Ins
tru
cti
on
s
Merged Engines Stream4Flow-Portscan SfPortscanArgus
Connecting Data Transfering Clossing
Architectural Impact of Stateful Networking Applications 15
Available Parallelism
Processor configuration modified towards avoiding any constraint
The ILP is independent of the app category It is inherent to the application itself
The evaluated apps show low ILP: ~3,7 IPC
0
2
4
6
8
10
MergedEngines
Stream4 Flow-portscan
SfPortscan Argus NpBench(ControlPlane)
NpBench(Data Plane)
Benchmark
IPC
(~4200) (~45)
Architectural Impact of Stateful Networking Applications 16
Impact of Bottlenecks
Stateful apps show very lower performance Roughly 0,6 IPC on average
The importance of the packet processing Constant vs concentrated workload
Memory Impact 3x – 19x of speed up
0
0,5
1
1,5
2
2,5
3
3,5
4
MergedEngines
Stream4 Flow-portscan SfPortscan Argus
Benchmark
IPC
Baseline Perfect Branch
Perfect Mem Perfect Mem & Perfect Branch
Architectural Impact of Stateful Networking Applications 17
Branch Prediction (I)
High branch prediction accuracy on average But we have two branch categories
Flow independent: similar among packets -> easy to predict Flow dependent: flow related -> sensitive to traffic properties
90%
92%
94%
96%
98%
100%
MergedEngines
Stream4 Flow-portscan SfPortscan Argus
Benchmark
Hit
Ra
te
Architectural Impact of Stateful Networking Applications 18
Branch Prediction (& II)
A single active connection Higher accuracy and no variations among n-th packets
High traffic aggregation level Lower accuracy and vairations among n-th packets
Negative aliasign due to flow dependent branches Most of our applications hide this effect due to concentrated workload
86%
88%
90%
92%
94%
96%
98%
100%
1 2 3 … … n-3 n-2 n-1 n
Flow Live (N-th Pkt)
Bra
nc
h P
red
icti
on
Hit
Ra
te
Connecting Data Transfering Clossing
86%
88%
90%
92%
94%
96%
98%
100%
1 2 3 … … n-3 n-2 n-1 n
Flow Live (N-th Pkt)
Bra
nc
h P
red
icti
on
Hit
Ra
te
Merged Engines
Stream4
Flow-Portscan
SfPortscan
Argus
Connecting Data Transfering Clossing
No traffic aggregation level High traffic aggregation level
Architectural Impact of Stateful Networking Applications 19
Data Cache Behavior (I)
Stateful apps need reduced DL1$ to get steady miss rate Taking advantage of flow independent memory references
Almost 100% of DL2$ accesses are misses It is unable to keep the state of the active flows
Larger flow-states emphasize network properties impact Getting higher steady state even with low traffic aggregation The intra-flow distribution may be more helpful
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
1024 2048 4096 8192
$L2 Size (KB)
Mis
s R
ate
Merged Engines Stream4
Flow-portscan SfPortscan
Argus
0%
5%
10%
15%
20%
25%
4 8 16 32 64 128 256 512 1024
$DL1 Size (KB)
Mis
s R
ate
Merged Engines Stream4
Flow-portscan SfPortscan
Argus
Architectural Impact of Stateful Networking Applications 20
Data Cache Behavior (& II)
Negative effects of the memory concentrated in the first packets Constant workload applications show similar miss rate for every
packet Extra miss rates for data structures maintainance
Merged Engines from 1,5% to 5% on average
0,0%
0,5%
1,0%
1,5%
2,0%
2,5%
3,0%
1 2 3 … … n-3 n-2 n-1 n
Flow Live (Pkt N-th)
To
tal
Dat
a L
2 M
iss
Rat
e
Merged Engines Stream4
Flow-Portscan SfPortscan
Argus
Connecting Data Transfering Clossing
Architectural Impact of Stateful Networking Applications 21
Summary (I) We present the architectural impact of Stateful
Networking Applications An important new type of applications
The behavior along the packets of a TCP connection Constant workload for the packets of a connection Workload concentrated in the first packets of a connection
Analysis of network traffic properties Branch prediction and data cache are sensitive to them
Architectural Impact of Stateful Networking Applications 22
Summary (& II) Reduced IPC on average
L2$ is unable to maintain the required states of active flows
Branch prediction also may improve once solved memory bottleneck
Other stateful applications may present different valuable results, but… The critical bottlenecks even may be more stressed
Our concern is … To have more sample applications to evaluate To analyse the apps in a more realistic environment
• Running simultaneously a number of applications
Architectural Impact of Stateful Networking Applications 23
Questions...
Architectural Impact of Stateful Networking Applications 24
Traffic Traces Filtered Traffic Trace
Bidirectional TCP connections
Generating Synthetic Traffic Traces Mixing different traffic traces
• microTimestamp sorting based We are assuming a set of traces with the same bandwidth
link• In our case: MRA link
Avoiding the aliasing of IP addresses among aggregated traces
• The set of traces are originally sanitized
The resulting traffic trace shows roughly 1Gbps 170K active flows
• Achieved from the original OC12 MRA link (622Mbps)