the cms event builder demonstrator based on myrinet
DESCRIPTION
The CMS Event Builder Demonstrator based on Myrinet. Introduction Myrinet Overview Tests of the Switching Fabric Event Building Studies Future Work and Conclusions. Frans Meijers CERN/EP on behalf of the CMS DAQ group CHEP2000, Padova Italy, Feb 2000. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
1 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
The CMS Event Builder Demonstrator based on Myrinet
IntroductionMyrinet OverviewTests of the Switching FabricEvent Building StudiesFuture Work and Conclusions
Frans Meijers CERN/EPon behalf of the CMS DAQ group
CHEP2000, Padova Italy, Feb 2000
2 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Introduction
DAQ architecture and EVB parameters Event building by switches. Crossbar EVB traffic shaping: barrel shifter Banyan network A multistage 1024 port switch The CMS DAQ system
3 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
DAQ architecture and EVB parameters
100 kHz
1 Mbyte
1 Tbps
Detector Front-end
Computing Services
ReadoutSystems
Builder and Filter
Systems
Event Manager
Builder Networks
Level 1Trigger
RunControl
5122 kbyte
Level-1 Maximum trigger rate
Average event size
Builder network (512x512 port) aggregate throughput
Number of Readout Units Average event fragment size
High Level Trigger acceptance 1 - 10 %
4 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Event building by switches. Crossbar
The maximum switch load for random traffic is about 63% (large N limit) due to head-of-line blocking
Higher efficiency:• queues at input and/or outputs ports• traffic shaping (example: barrel shifter 100%)
NxN matrixN2 number of crosspoints
5 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
EVB traffic shaping: barrel shifter
sources emit to mutually exclusive destinations in a cycle • works only for fixed size chunks • needs synchronisation
Event
1234
5
Event 234
5
1
Step 1 Step 2 Step 3 Step 4
6 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Banyan network
Example : 8x8 made of 3 stages 2x2 (8=23)
• single path per connection • suffers from internal blocking • number of cross points : N log2 N
• For random traffic (no intermediate IQ and no OQ): efficiency drops with s, N; for “infinite” N, eff. 20% • There exists a non-blocking barrel-shifting pattern
s0
s7
d0
d7
7 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
A multistage 1024 port switch Banyan topology: NxN out of nxn N=ns
• basic unit: 8x8 crossbars • 3 stages: 512x512 • need 192 crossbars in total
Important to study multistage switches
8 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
The CMS DAQ system
F U
Computing and Communication Services
EVM
LV1
R U
Detector front-end readout
Ctrl
9 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Myrinet overview
Myrinet features Myrinet switches Network Interface Card
10 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Myrinet is a System Area Network (SAN) point to point links, byte wide, full-duplex, 1.3 Gbps per direction,
very low error rate
packet structure: routing header, payload and tail each crossbar switch strips leading byte from routing header
wormhole routing (versus store-and-forward) no buffering, low latency, arbitrary length packets
byte based flow control (STOP/GO) no packet loss inside switching fabric 3Q 2000: link speed from 1.3 Gbps to 2.6 Gbps
Myrinet features
PAYLOAD
ROUTING HEADER
......CRC
STOP
GO
11 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Myrinet switches
M2M-OCT-SW8• 32 ports • 8 times 4x4 crossbars
7 6
3
5
2
4
1 0
• Large switch fabric built out of 4x4 crossbar elements• now 8x8 crossbar available as basic element
12 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Network interface card
MyrinetSAN link32 or 64
(33 or 66 MHz)
hostDMA
RISC
Pkt Interface
Memory
Address Data
LANai7
Send DMA
64 (66 MHz)
PCIBridge
66 MHz
2 MByte
Recv DMA8(80 MHz, NRZ)
8
M2M-PCI64
Developed a custom Myrinet Control Program• controls DMA engines• implements low-level communication protocol
13 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Switch tests
Set-up for switch test Traffic conditions tested Point-to-point 1x1 Parameters point-to-point 1x1 Point-to-Point NxN - Mutually exclusive paths Block on output port Block on internal switch Random Traffic
14 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Demonstrator set-up for switch tests
• 32 nodes Linux PCs• PC: 450 MHz PII BX PCI 33 MHz/32bit • Myrinet switch: M2M-OCT-SW8, NIC: M2M-PCI64[A] • two-stage Banyan network out of 4x4 crossbars
sources
destinations
15 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
1 2 3 4 5 6 7 1615141312111098
1 2 3 4 5 6 7 1615141312111098
Traffic conditions tested
1 2 3 4 5 6 7 1615141312111098
3217 18 19 20 21 22 23 3130292827262524
1 2 3 4 5 6 7 1615141312111098
1 2 3 4 5 6 7 1615141312111098
1 2 3 4 5 6 7 1615141312111098
1 2 3 4 5 6 7 1615141312111098
Random traffic
Point-to-point traffic (fixed destinations)1 2 3 4 5 6 7 1615141312111098
3217 18 19 20 21 22 23 3130292827262524
1 2 3 4 5 6 7 1615141312111098
3217 18 19 20 21 22 23 3130292827262524
16 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Point-to-point 1x1
full host - NIC DMA: limited by PCI (33 MHz/32bit)partial host - NIC DMA: NIC memory - link: full packet host - NIC: only headerslimited by SAN link Allows to load switch to maximum
PCI
link
17 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Parameters point-to-point 1x1
Partial host - NIC DMA
• above 1 kbyte: linear behaviour• below 1 kbyte: plateau 5 s (NIC-host communication)
speed: 128 Mbyte/s -> PCI speed speed: 141 Mbyte/s -> 92% link eff.
Full host - NIC DMA
time per packet = overhead + size / speed
18 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Point-to-point NxN - Mutually exclusive paths
[d = 4*(s%4)+s/4, s=0-15]
As expected;Aggregate throughput through the switch is linear in N
1x14x4
8x8 16x16
sd 4
19 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Block on output port
measured at source #0
Force m (=1,2,3,4) sources on the same destination:Each source gets 1/m of Vmax
1
2
34
20 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Block on internal switch
Force 2 sources on different destinations, but through same intermediate path:
As expected; plateau at Vmax/2
measured at source #0
21 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Random traffic
measured at destinations
Efficiency: 4x4: 69 % expect 68%16x16: 51 % limited by head-of-line blocking
sources send, independently, to a random destination according to a uniform distribution
1x1
4x4
16x16
1 2 3 4 5 6 7 1615141312111098
1 2 3 4 5 6 7 1615141312111098
22 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Event building studies EVB demonstrator set-up Event building protocol Variable size event fragments Event building performance Event building: scaling behaviour Traffic shaping EVB performance with traffic shaping performance for variable size event fragments EVB with traffic shaping: scaling behaviour Traffic shaping: time evolution
23 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
EVB demonstrator set-up
• 32+1 Linux PCs [450 MHz PII BX PCI 33 MHz/32bit] • Myrinet switch: M2M-OCT-SW8, NIC: M2M-PCI64[A] • 16x16 two-stage Banyan network out of 4x4 crossbars• Myrinet between RUs and BUs (full duplex). N-to-N traffic• Fast Ethernet between BUs and EVM. N-to-1 traffic• No emulation of Level-1 trigger
EVM
PC: emulate RU
PC: emulate BU
24 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Request EvtId
BU EVM RU
EvtId
Request Data
Send Data
Clear EvtId
EVM Builder Network
RU
BU
Event building protocol
level1
Several EvtId messages are grouped in a single Ethernet packet
Myrinet
25 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Variable size event fragments
Log-normal distributionexample: Average = 2 kbyte, RMS = 2 kbytemimics CMS data readout
EVBEVB
Builder Units
Readout Units
26 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Event building performance
Fragment rate per node † 16x16:For 2 kbyte fragments: 30 kHz
• No traffic shaping• Fixed size event fragments
2k
unstable
4x4
8x816x16
1x1
results:• 1x1 is close to point-to-point• Performance decrease from 4x4 to 8x8 to 16x16, as expected• from small sizes: overhead 7 s
† Fragment rate per node = level-1 rate
27 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Event building - scaling behaviour
• take average fragment size of 2 kbyte• also variable size fragments
results:• For variable size reduced performance, as expected• No scaling in N
Need simulation for large N
?
28 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Traffic shaping
• Sources divide fragments into fixed size packets (blocks) and cycle through all destinations• Inspired by ATM rate division (block size is 53 bytes)• Should work for large N multistage switch as well
Implementation: • Performed by NIC control program• Block size set to 4 kbyte (30 s cycle)• Barrel shifter without external synchronisation (Myrinet back pressure by HW flow control)• Packets can be (partially) empty ...... ... ...
BU0 BU1 BU2 BU3
RU0 RU1 RU2 RU3
29 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
EVB performance with traffic shaping
• fixed size event fragments4k
results:• close to point-to-point
fragment rate per node 16x16:for 2 kbyte fragments: 65 kHz2k
30 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Performance for variable size event fragments
2k
decrease of efficiency withlarger RMS of fragment size distribution (in agreement with Monte Carlo)
[†with full host-NIC DMA about 80 Mbyte/s or 40 kHz]
Fragment rate per node for nominal average of 2k and RMS 2k †: 60 kHz
31 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
EVB traffic shaping - scaling behaviour
EVB
with traffic shaping: approximate scaling
32 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Traffic shaping - time evolution (I)
BS cycling rate * block size
23:00 ?• throughput dropped• traffic shaping barrel shifter stayed in sync
?
2 hours (= 2 108cycles, 10 Tbyte moved)
33 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Traffic shaping - time evolution (II)
1 hour (= 108cycles)
BS cycling rate * block size
perturb system :1: slow down RU1: all BU’s reduced rate2: slow down BU1: only BU1 reduced rate
1 2
traffic shaping barrel shifter stays in sync
EVM
RU
BU
34 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Future work and conclusions
Future work Conclusions
35 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Future work
Evaluate Myrinet 2000 available 3Q 2000 link speed from 1.3 Gbps to 2.6 Gbps switches based on 8x8 crossbars as elementary units
Further study of traffic shaping Simulation Extrapolate to large systems
36 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Conclusions Event builder demonstrator 16x16 based on Myrinet multistage
switch and Linux PCs established. Performed systematic switch studies. As expected. Measured event building performance
without traffic shaping: no scaling, as expected with traffic shaping: approximate scaling
For nominal event fragment sizes with average and RMS of 2 kbyte achieved about 60 kHz trigger rate or 120 Mbyte/s per node (almost 2 Gbyte/s aggregate)
That is, today, a factor two off from CMS needs, assuming scaling. Measurements provide parameters for simulation of large scale
(512x512) systems
37 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Extra Material
38 The CMS Event Builder Demonstrator based on Myrinet Frans Meijers. CHEP 2000, Padova Italy, Feb 2000
Multi-step Event Building
Step 1: at 100 kHzRejection factor 10 with 0.25 of the data from High Level Trigger
Step 2: at 10 kHzRemaining 0.75 of the data
Throughput reduced by 0.25+0.1x0.75=0.33, ie factor 3 At the cost of control complexity and increased latency
• With link speed of 1 Gbps need factor 2 from multi-step event building for 100 kHz level-1 rate (assuming 100% efficient switch )• If higher speed links in 2003-2004, then single-step event builder
100 kHz
10 kHz