system performance monitoring in the alice data acquisition system with zabbix adriana telesca...
TRANSCRIPT
![Page 1: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/1.jpg)
System performance monitoring in the
ALICE Data Acquisition System
with ZabbixAdriana TelescaOctober 15th, 2013
CHEP 2013, Amsterdam
![Page 2: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/2.jpg)
The ALICE Data Acquisition system
ALICE at the CERN LHC
Data Acquisition system requirements:•4 GB/s sustained recording rate•2.5 GB/s transfer to tape
15/10/2013 Adriana Telesca, CHEP 2013 2/24
![Page 3: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/3.jpg)
The ALICE Data Acquisition system
For Run 2 (2015-2017): ~ 1000 nodes•Readout•Event building•Recording•Storage•Support (network, PDUs)•Operations
For Run 3 (2019-2021): ~ 2000 nodes
15/10/2013 Adriana Telesca, CHEP 2013 3/24
![Page 4: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/4.jpg)
The ALICE Data Acquisition system
For Run 2 (2015-2017): ~ 1000 nodes•Readout•Event building•Recording•Storage•Support (network, PDUs)•Operations
For Run 3 (2019-2021): ~ 2000 nodes
15/10/2013 Adriana Telesca, CHEP 2013 3/24
O2: a new combined online and offline computing for ALICE after 2018
P. Vande Vyvre’s talk today at 16:45 – Data Acquisition track
![Page 5: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/5.jpg)
Lemon was used to monitor the DAQ system during Run 1 (2008-2013).
Decision to replace it:•Lemon future unsure•Tools with additional/new functionalities•LHC Long Shutdown 1
Lemon
15/10/2013 Adriana Telesca, CHEP 2013 4/24
![Page 6: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/6.jpg)
ALICE DAQ monitoring system needs
Low impact
Extensibility/Flexibility
Scalability
15/10/2013 Adriana Telesca, CHEP 2013 5/24
![Page 7: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/7.jpg)
ALICE DAQ monitoring system needs
Full administration GUI
Easy access to data
Interface with other components
ORTHOSAlarming system
15/10/2013 Adriana Telesca, CHEP 2013 6/24
![Page 8: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/8.jpg)
Parameters to monitor
CPUMemoryDisk usageNetwork InterfacesProcesses
VoltageCurrentTemperatureOutlet statusDisk status
Ethernet: CPU utilization Memory utilization Cards temperature
Fiber Channel: RX/TX ports rate
Readout links Bytes In/OutDAQ XOFF, HLT XOFFProcesses CPU and memory
15/10/2013 Adriana Telesca, CHEP 2013 7/24
![Page 9: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/9.jpg)
Shortlist
Selection criteria:1.SNMP 2.Logical grouping
3. Large user community4. Distributed monitoring
Name Agent SNMPSyslog WebApp
Data Storage Method License
Cacti No Yes Yes Full
Control RRDtool, MySQL GPL
Icinga Supporte
d Via
plugin Via
plugin Full
Control MySQL, PostgreSQL, Oracle Database GPL
Zabbix Supporte
d Yes Yes Full
Control
Oracle, MySQL, PostgreSQL, IBM DB2, SQLite GPL
Zenoss No Yes Yes Full
Control ZODB, MySQL, RRDtool GPL
+ Splunk
+ MonALISA
Supported
Yes Yes Full control
Raw files Commercial
15/10/2013 Adriana Telesca, CHEP 2013 8/24
Source: http://en.wikipedia.org/wikiComparison_of_network_monitoring_systems
![Page 10: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/10.jpg)
Name Data gathering
Graphing Triggering
Scalability
Data Storage
Extensibility
Icinga Agent 0 1 1 – up to 1000 hosts
DB 2
Cacti Server 2 0 1 – up to 1000 hosts
RRDtool – DB
2
Zenoss Server 1 1 2 – 1000+
RRDtool – DB
1
Zabbix Agent or Server
2 1 2 – 1000+
DB 2
Splunk Agent 2 1 2 – 1000+
Raw files
2
MonALISA Agent 2 1 2 – 1000+
DB 2
Tools comparison
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 9/24
![Page 11: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/11.jpg)
Name Data gathering
Graphing Triggering
Scalability
Data Storage
Extensibility
Icinga Agent 0 1 1 – up to 1000 hosts
DB 2
Cacti Server 2 0 1 – up to 1000 hosts
RRDtool – DB
2
Zenoss Server 1 1 2 – 1000+
RRDtool – DB
1
Zabbix Agent or Server
2 1 2 – 1000+
DB 2
Splunk Agent 2 1 2 – 1000+
Raw files
2
MonALISA Agent 2 1 2 – 1000+
DB 2
Tools comparison
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 9/24
![Page 12: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/12.jpg)
Name Data gathering
Graphing Triggering
Scalability
Data Storage
Extensibility
Icinga Agent 0 1 1 – up to 1000 hosts
DB 2
Cacti Server 2 0 1 – up to 1000 hosts
RRDtool – DB
2
Zenoss Server 1 1 2 – 1000+
RRDtool – DB
1
Zabbix Agent or Server
2 1 2 – 1000+
DB 2
Splunk Agent 2 1 2 – 1000+
Raw files
2
MonALISA Agent 2 1 2 – 1000+
DB 2
Tools comparison
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 9/24
![Page 13: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/13.jpg)
Name Data gathering
Graphing Triggering
Scalability
Data Storage
Extensibility
Icinga Agent 0 1 1 – up to 1000 hosts
DB 2
Cacti Server 2 0 1 – up to 1000 hosts
RRDtool – DB
2
Zenoss Server 1 1 2 – 1000+
RRDtool – DB
1
Zabbix Agent or Server
2 1 2 – 1000+
DB 2
Splunk Agent 2 1 2 – 1000+
Raw files
2
MonALISA Agent 2 1 2 – 1000+
DB 2
Tools comparison
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 9/24
![Page 14: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/14.jpg)
Name Data gathering
Graphing Triggering
Scalability
Data Storage
Extensibility
Icinga Agent 0 1 1 – up to 1000 hosts
DB 2
Cacti Server 2 0 1 – up to 1000 hosts
RRDtool – DB
2
Zenoss Server 1 1 2 – 1000+
RRDtool – DB
1
Zabbix Agent or Server
2 1 2 – 1000+
DB 2
Splunk Agent 2 1 2 – 1000+
Raw files
2
MonALISA Agent 2 1 2 – 1000+
DB 2
Tools comparison
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 9/24
![Page 15: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/15.jpg)
Tools comparison
Name SNMP Community
Granularity Auto Discovery
Free
Icinga 2 2 1 - 1 minute /metric
2 1
Cacti 2 2 1 - 1 minute / metric
1 1
Zenoss 1 1 1- 1 minute /collector
2 1
Zabbix 2 2 2 - No limit /metric
2 1
Splunk 2 2 2 - No limit / metric
2 0
MonALISA 2 1 1 - 1 minute /metric
2 1
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 10/24
![Page 16: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/16.jpg)
Tools comparison
Name SNMP Community
Granularity Auto Discovery
Free
Icinga 2 2 1 - 1 minute /metric
2 1
Cacti 2 2 1 - 1 minute / metric
1 1
Zenoss 1 1 1- 1 minute /collector
2 1
Zabbix 2 2 2 - No limit /metric
2 1
Splunk 2 2 2 - No limit / metric
2 0
MonALISA 2 1 1 - 1 minute /metric
2 1
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 10/24
![Page 17: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/17.jpg)
Tools comparison
Name SNMP Community
Granularity Auto Discovery
Free
Icinga 2 2 1 - 1 minute /metric
2 1
Cacti 2 2 1 - 1 minute / metric
1 1
Zenoss 1 1 1- 1 minute /collector
2 1
Zabbix 2 2 2 - No limit /metric
2 1
Splunk 2 2 2 - No limit / metric
2 0
MonALISA 2 1 1 - 1 minute /metric
2 1
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 10/24
![Page 18: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/18.jpg)
Tools comparison
Name SNMP Community
Granularity Auto Discovery
Free
Icinga 2 2 1 - 1 minute /metric
2 1
Cacti 2 2 1 - 1 minute / metric
1 1
Zenoss 1 1 1- 1 minute /collector
2 1
Zabbix 2 2 2 - No limit /metric
2 1
Splunk 2 2 2 - No limit / metric
2 0
MonALISA 2 1 1 - 1 minute /metric
2 1
0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 10/24
![Page 19: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/19.jpg)
Tools comparison
Name SNMP Community
Granularity Auto Discovery
Free Total
Icinga 2 2 1 - 1 minute /metric
2 1 12
Cacti 2 2 1 - 1 minute / metric
1 1 12
Zenoss 1 1 1- 1 minute /collector
2 1 11
Zabbix 2 2 2 - No limit /metric
2 1 16
Splunk 2 2 2 - No limit / metric
2 0 15
MonALISA 2 1 1 - 1 minute /metric
2 1 14 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 11/24
![Page 20: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/20.jpg)
Tools comparison
Name SNMP Community
Granularity Auto Discovery
Free Total
Icinga 2 2 1 - 1 minute /metric
2 1 12
Cacti 2 2 1 - 1 minute / metric
1 1 12
Zenoss 1 1 1- 1 minute /collector
2 1 11
Zabbix 2 2 2 - No limit /metric
2 1 16
Splunk 2 2 2 - No limit / metric
2 0 15
MonALISA 2 1 1 - 1 minute /metric
2 1 14 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good15/10/20
13 Adriana Telesca, CHEP 2013 11/24
![Page 21: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/21.jpg)
• Graphing
• Full configuration GUI
• Many ways of data retrieval scalability
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 12/24
![Page 22: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/22.jpg)
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 13/24
![Page 23: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/23.jpg)
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 14/24
![Page 24: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/24.jpg)
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 14/24
![Page 25: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/25.jpg)
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 14/24
![Page 26: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/26.jpg)
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 14/24
![Page 27: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/27.jpg)
Zabbix characteristics
15/10/2013 Adriana Telesca, CHEP 2013 14/24
![Page 28: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/28.jpg)
Zabbix footprint tests
15/10/2013 Adriana Telesca, CHEP 2013 15/24
![Page 29: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/29.jpg)
Zabbix footprint tests
15/10/2013 Adriana Telesca, CHEP 2013 16/24
![Page 30: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/30.jpg)
Zabbix footprint tests
15/10/2013 Adriana Telesca, CHEP 2013 17/24
![Page 31: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/31.jpg)
Zabbix footprint tests
15/10/2013 Adriana Telesca, CHEP 2013 18/24
![Page 32: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/32.jpg)
Zabbix dashboard and usage
15/10/2013 Adriana Telesca, CHEP 2013 19/24
![Page 33: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/33.jpg)
Zabbix dashboard and usage
15/10/2013 Adriana Telesca, CHEP 2013 20/24
![Page 34: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/34.jpg)
Zabbix dashboard and usage
15/10/2013 Adriana Telesca, CHEP 2013 21/24
![Page 35: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/35.jpg)
Zabbix dashboard and usage
15/10/2013 Adriana Telesca, CHEP 2013 22/24
![Page 36: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/36.jpg)
The evaluation of different monitoring tools resulted in the selection of Zabbix.
Zabbix meets the ALICE DAQ needs.
Zabbix will be in production for Run 2.
Conclusion
15/10/2013 Adriana Telesca, CHEP 2013 23/24
![Page 37: System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e885503460f94b8d236/html5/thumbnails/37.jpg)
Thanks.Questions?