tell1 the daq interface board for lhcb experiment gong guanghua, gong hui, hou lei dep, tsinghua...
TRANSCRIPT
TELL1 TELL1 The DAQ interface board for LHCb The DAQ interface board for LHCb
experimentexperiment
Gong guanghua, Gong hui, Hou leiDEP, Tsinghua Univ.
Guido HaefeliEPFL, Lausanne
Real Time 2009 15, May, 2009 IHEP, Beijing
OutlineIntroduction
Hardware
Onboard Signal processing
Future upgrade plan
Summary
33
TELL1 in LHCb DAQ architecture
Steal from TDA2-2, Federico Alessio
SWITCH
High-Level Trigger farm
Detector
Timing & Fast
Control
SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH
READOUT NETWORK
LHC clock
Event Requests
Event building
Front-End
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Readout Board
VELO ST OT RICH ECal HCal Muon
SWITCH
Mon. farm
CPU
CPU
CPU
CPU
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
L0 trigger
L0 Trigger
320 ROBs• 24 [email protected] Gb/s• 4 outputs@1 Gb/s
50 TB with 70 MB/s
3000 GbE ports35 GB/s
50 subfarms of ~40 nodes
Sh
ieldin
g w
all 5000 optical/analog linksO (4 Tb/s)
Offline
TELL1
A common readout board for all LHCb detectors (except RICH)
Input: ◦ analogue copper/digital optical.◦ 30Gbit/s input data rate
Process:◦ 5 large FPGA ◦ Process event at 1.11MHz
Buffer:◦ 96MB DDR SDRAM for each PP◦ 2MB QDR SRAM for SyncLink◦ FPGA internal RAM
output: ◦ 4Gbps copper links.
TTC & ECS:
ECS RO-Tx
SyncLink-FPGA
FEM
L1B
PP-FPGA
A-Rx16 x 10-bit
L1B
PP-FPGA
A-Rx16 x 10-bit
L1B
PP-FPGA
L1B
PP-FPGA
O-Rx12 x 1.28 Gbit/s
TTCrx
ECS TTC HLTL1T Throttle4 x 1Gbit/s
FE linksFE linksFE links
TELL1
TELL1 overview
Advantage of being common: cost reduction, development work sharing , easy maintenance
4x16 Analog @10x40MHzor
2x12 Optical @16x80MHz
4x PreProcess
FPGA
DDR buffer
CreditCard PC &GlueCard
Dual (Quad) GBE
Synclink FPGA
TTCrx
TELL1 mezzaninesARX:
◦ 16ch Analog card for VELO. (40Msps, 10bit ADC)
ORX: ◦ 12ch optical card for all other
detectors. 1.28GbpsCCPC:
◦ Credit Card PC running Linux as the control interface
GBE: ◦ 4ch GigaBit Ethernet card
FEM: ◦ a cycle accurate emulation of the
front end electronics
TELL1
TELL1 in LHCb
The TELL1 board is used by all but the RICH sub-detector in the LHCb experiment.
VELO 83
Silicon Tracker (IT,TT) 42+48
Outer Tracker 48
ECAL,HCAL,PCAL 44
MUON 20
L0PUL 5
--- total of 290 boards needed ---
TELL1 framework Common function provided
as “TELL1 framework”◦ Input event synchronization◦ Hardware interface
Memory TFC GBE CCPC
◦ Event fragment assemble◦ Ethernet packet assemble◦ Data flow control◦ Monitor and statistic
Detector specific function developed by detector group◦ FIR◦ Re-order◦ Pedestal subtraction◦ Common Mode Correction ◦ zero suppression
Input
Output
Clustering &Zero suppression
Signal cleanup
Common function
From receivers
To Gigabit MAC
Memory
CCPC
TFC
Array of processorsF
rom
rec
eive
rs
PP-FPGA
Linker
IF
Pace MakerECS TTC
Mem
Mux Proc
Mem
Sync
Mem Sync
Proc
Mem
Mem Proc
Mem
Proc
Mem
Mem
Mem
Mux Proc
Mem
Sync
Mem Sync
Proc
Mem
Mem Proc
Mem
Proc
Mem
Mem
Proc Proc Proc ProcProc
1.1MHz trigger rate asks for fully pipelined and simultaneously process.
In each 900ns, one event can be processed!
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header
32 x Data 4 x Header Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Data preparation for processing
The data has to be available simultaneously to all processing channels!
To be more efficient the processing clock frequency is increased and the data is multiplexed!
8-bit
04365490108 58
3232 44 18 18
900ns
t [120MHz cycle]
32 x Data 4 x Header
3232 4418 18
8-bit
8-bit
8-bit
8-bit
8-bit
8-bit
8-bit
900ns
Event 1
32 x Data 4 x Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
32 x Data 4 x Header 32 x Data 4 x
Header
Event 2
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Signal process
Some Process with fixed rate of 900ns/event FIR, Pedestal subtraction, re-order, common-mode suppression
zero suppression for high occupancy events the data size is increased by the cluster encoding and therefore large processing time might occur.
De-randomization is required for zero suppression processing ! Large buffers and buffer overflow control is needed. The processing can still be pipelined but the average
processing time must be respected.
Zero Supp.
MEP assemb
le
Eventlinker
Average event rate < 900ns
N
Derandomizerbuffer
Fixed input rate900ns/event
Overflow control
Sync FIR LinkerMEP
Derandomizerbuffer
CMS
32 3232 32 32 32
EthIP
Overflow control
Event driven process
Zero Supp.
MEP assemb
le
Eventlinker
Average event rate < 900ns
N
Derandomizerbuffer
Fixed input rate900ns/event
Overflow control
Sync FIR LinkerMEP
Derandomizerbuffer
CMS
32 3232 32 32 32
EthIP
Overflow control
FIR CMSSyncEventbuffer
Eventbuffer
Eventbuffer
• No state exchange between process• process is activated when
•input buffer not empty•Output buffer not full
•Process traffic will prorogate back, event pileup into derandomizer buffer
process schedule
The Pace Maker is imposing the correct timing to the distributed processors.
Every 450ns a new processing cycle is started and the
incoming data is processed.
Fixed data size and fixed event rate are used to pipeline the processing. Periodic processing cycle counter
Pace Maker
Proc Proc Proc ProcProc
32 4
120MHz cycle
450ns 450ns 450ns 450ns450ns
32 4 32 4 32 4 32 4
4
32
18
Many other features of“TELL1 framework” Interface logic
◦ Memory, GBE, TTC, ECS Event merging
◦ Merge of different process channels◦ Merge of 4 PP_FPGA◦ Merge of different data bank
Monitor and statistic function◦ Polarity/CRC check for all data
buses.◦ Counter,Data rate statistic◦ Buffer usage monitor◦ Histogram
Run library for CCPC◦ Programming all FPGA◦ Useful routine for configure,
monitoring and control
Input
Output
Common function
From receivers
To Gigabit MAC
The framework provides a transparent transfer example.
Tell1 sofarDesigned in 2003, FPGA technology is Altera Stratix I Processing power 125K LE @ 120
MHzOnly small external buffering 2
Mbyte for DAQ interface.
Limitation of Tell1
DAQ bandwidth 4 Gbit/s should be more 10 Gbit/s for certain detectors , OT and CAL in LHCb suffer from this.
Input bandwidth should be as large as possible to reduce the number of module.
Buffering of raw data stream for NA62 is required.
CCPC limits the speed of the slow control access.
From Tell1 to Tell5 Optical reception without receiver mezzanine
– direct reception with optical transceivers on FPGAs but still support the current 4x200-pin interface.
CCPC upgrade to faster CCPC module and increase the speed of the slow control access to the FPGAs.
Add soft/hard coded microprocessor to each but at least to main control (SyncLink) FPGA.
Add sufficient SD-RAM to buffer the raw data stream without suppression.
Add 1x or 2x 10-Gigabit Ethernet slots, Make it compatible to 10G copper PHY moduls XFP+ (10G over Cat 6 UTP)
Full speed raw data buffering possible
20 Gbyte/s bandwidth
Very large processing block (5 x Tell1)
FPGA on chip transceivers used
20 Gbit/s FPGA-FPGA interconnect (5 x
Tell1)
48 x 1.28 Gbit/s optical (2x Tell1)
64 x 10bit, Tell1 IF
Microprocessor for control tasks
Add one or two 10 Gbit Ethernet
(copper)
The very long timescale to the first possible upgrade to 40MHz readout requires to divide the design of the final board into two major steps
The TELL40 should be as much as possible flexible to adapt to future changes in requirements and technologies:◦ New bigger, faster, more serial FPGAs.◦ Advance in network technology, we expect the change from 10GE to
100GE.◦ Advanced packaging of serial optical links, 12-way 10G transceivers.◦ Multi channel 10GE MAC ◦ Multi channel 10G transceiver chips◦ New protocols, SPI-4.2 Interlaken …?
Modular design compatible to current and future data link interfaces from detectors.◦ Compatible to current LHCb optical link ◦ Compatible to GBT developed by Cern.◦ Compatible with parallel interface as analog Rx but also TDCs.
From Tell1 to Tell40
Architectural overview Architectural overview (preliminary)(preliminary)
CCPC
ctr
l
Ethernet
10-bit
XGMII orParallel
VSC34411:10
Deserializer
10-bit
10-bit
10-bit
10-bit
ORX
0.125 to 6.375 GHz InterlakenUp to
5x6.25GHz
XAUI1.28 Gbit/s
DDR2(3) SODIMM
64
64
64 x 0.4 GHz x 2 = 51 Gbit/s / SODIMM
4 or 8 x
ECS FPGA
ctr
l
ctr
l
ctr
l
CS3477
XFI10 Gbit/s
ctrl SFP+
XFI10 Gbit/s
ctrl SFP+
XFI10 Gbit/s
ctrl SFP+
XFI10 Gbit/s
ctrl SFP+
Interlaken
FPGA
Advantages
With the 4 x 10-GE MAC it is a very balanced solution regarding IO.
16TX/RX at 3.125GHz (low speed) or 8-10 TX/RX at high speed 6.25 GHz) are needed, given in midrange Altera Stratix 4 GX or Xilinx Virtex 5 LXT,SXT, (FXT,TXT faster serial), (Virtex6 just announced now also).
Input receiver, operates into the GBT speed region, only 10-bit parallel interface at data speed of 640 MHz.
Low logic consumption due to external MAC.SFP+ compatible chip
““Lane” with external MAC and Rx Lane” with external MAC and Rx mezzanine, 4x10GE (preliminary)mezzanine, 4x10GE (preliminary)
VSC3441
VSC3441
VSC3441
STRATIX4 GX1517C
onne
ctor
SO-DIMM Connector
SO-DIMM Connector
CortiaCS34771024-pin
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
Con
nect
or
VSC3441
USBJTAG PCIe,1x
OSC
Power
EEPROM
RJ451GE MACFLASH
9U board with 3 “Lanes” of each 4x 9U board with 3 “Lanes” of each 4x 10GE (preliminary)10GE (preliminary)
VSC3441
VSC3441
VSC3441
STRATIX4 GX1517Co
nnec
tor
SO-DIMM Connector
SO-DIMM Connector
CortiaCS34771024-pin
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
Conn
ecto
r
USB
USB
USB
OSC
Power
EEPROM
VSC3441
VSC3441
VSC3441
STRATIX4 GX1517Co
nnec
tor
SO-DIMM Connector
SO-DIMM Connector
CortiaCS34771024-pin
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
Conn
ecto
r
USB
USB
USB
OSC
Power
EEPROM
VSC3441
VSC3441
VSC3441
STRATIX4 GX1517Co
nnec
tor
SO-DIMM Connector
SO-DIMM Connector
CortiaCS34771024-pin
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
Conn
ecto
r
USB
USB
USB
OSC
Power
EEPROM
PCIe,4x
TTCrx mezzanine
TTCrx mezzanine
SMA200
Micro Space65x58x9mm
2-3W
MACRJ45
Power
ECS
FPGA
Rx mezzanine for current LHCb Rx mezzanine for current LHCb (preliminary)(preliminary)
VSC3441
VSC3441
VSC3441
SNAP-1212xRx
Co
nne
cto
r
Rx mezzanine for GBT (preliminary)Rx mezzanine for GBT (preliminary)
GBT
GBT
GBT
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
SFP+ Transceiver
Con
nect
or
B
SummaryTELL1 has been successfully used in LHCb
experiment.◦ 290 in use◦ Used for almost all detectors◦ interest has been shown from other experiment
Common firmware framework makes the detector development easy◦ Detector group focus on specific code◦ The main part of framework has been steady for
almost years. ◦ Reliability, flexibility
We are looking forward to the upgrade◦ Increase input/output bandwidth◦ More process resource◦ Faster control interface
Thank you for your attention!