vo-specific systems for the monitoring of the lhc computing activities on the grid
DESCRIPTION
VO-specific systems for the monitoring of the LHC computing activities on the GRID. Julia Andreeva, CERN (IT/GS) NEC09, September 2009, Varna, Bulgaria. Outline. Monitoring from the VO perspective, motivation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/1.jpg)
EGEE-III INFSO-RI-222667
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
VO-specific systems for the monitoring of the LHC computing activities on the GRID
VO-specific systems for the monitoring of the LHC computing activities on the GRID
Julia Andreeva,
CERN (IT/GS)NEC09, September 2009, Varna, Bulgaria
![Page 2: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/2.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Outline
Monitoring from the VO perspective, motivation Overview of the existing VO-specific monitoring
systems and their role in operating of the WLCG infrastructure
Experiment Dashboard as an example of the common application used by all LHC VOs.
High-level cross-VO view based on data from VO-specific systems
Summary
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 2
![Page 3: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/3.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Monitoring from the VO perspective- why is it so important?
We are still accumulating practical experience operating Grid infrastructure
We are not yet aware of all possible troubles which could happen to the infrastructure as a whole or to individual components
As a consequence, we do not yet have enough knowledge to create perfect monitoring system which would alarm us in any critical situation or even better would predict such situation before it happens
All this implies considerable involvement of the user community to the operations
As current experience shows (CCRC08,STEP09 and beyond) as a rule VO communities, in particular people taking computing shifts, are those who detect problems in the first place
VO monitoring tools are the main monitoring instrument for the moment. They are aggregating and promptly adapting new experience in operating of the Grid infrastructure
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 3
![Page 4: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/4.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Main areas covered by the VO monitoring tools
Job processing (sharing and usage of the resources, performance, reasons of the failures and correspondingly related problems with the involved Grid services or VO applications)
Data transfer (throughput, efficiency, reasons for the failures and related problems with the involved Grid services)
Overall status of sites serving a given VO (site commissioning , computing shifts)
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 4
![Page 5: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/5.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Variety of tools used by the LHC VOs
ALICE
- MonAlisa for job processing- MonAlisa and Experiment Dashboard for data transfer ATLAS - Panda and Experiment Dashboard for job processing- Experiment Dashboard for data transfer CMS
- ProdMon and Experiment Dashboard for job processing
- Phedex for data transfer LHCb- Dirac for job processing- Dirac for data transfer
All experiments are using- SAM and Experiment Dashboard for monitoring of site status and status of the services at the sites- SLS for monitoring of services at Tier0
-
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 5
![Page 6: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/6.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
ALICE example
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 6
Monitoring system of ALICE based on MonAlisa monitoring systems.
Monalisa services at all ALICE sites for site-level monitoring + MonAlisa repository for a high-level view on the scope of the ALICE VO
![Page 7: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/7.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Monitoring of ATLAS DDM
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 7
Monitoring of ATLAS DDM is implemented in Dashboard framework.
The information sources are ATLAS DDM services at the sites. Data repository is implemented in ORACLE backend located at CERN.
Widely used by ATLAS community.
Up to 1K unique visitors per month,More that 100K pages are viewed daily
![Page 8: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/8.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
CMS example
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 8
Monitoring of the CMS transfers is coupled with the CMS Data distribution system PhEDEX.
Provides information about transfer rate, transfer quality, the status of the queue for transfer requests, etc…
For CMS Job Monitoring see next talk of Irina Sidorova
![Page 9: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/9.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Monitoring of the LHCb computing activities by Dirac
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 9
In LHCb both
Data transfer and job monitoring are provided by Dirac
![Page 10: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/10.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Experiment Dashboard as an example of system used by 4 LHC VOs
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 10
Experiment Dashboard is in production for 4 LHC VOsWidely used by the experiments for their everyday work (3K unique visitors (unique IP addresses) of the CMS production server in August 2009)Covers full range of the LHC computing activitiesWorks transparently across various Grid infrastructuresDeveloped as a result of the joined effort of the Dashboard team , developers in the LHC experiments and in other monitoring projects. In collaboration with institutes from Taiwan, Russia, France and Great Britain
![Page 11: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/11.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Collaboration with JINR and other Russian institutions
Russia is actively participating in the WLCG monitoring activity, namely contributing to the Dashboard project. From Russian side this work is coordinated by Vladimir Korenkov
Strong contribution from JINR: Irina Sidorova
Elena Tikhonenko
Sergey Belov
Sergey Mitsyn
Alexander Uzhinskiy
Andrey Nechaevskiy
Among our JINR colleagues there are many young developers recently graduated from Dubna University
Very much hope that this collaboration will continue
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 11
![Page 12: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/12.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Experiment Dashboard applications
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 12
Generic applications:•Job Monitoring•Task monitoring for the analysis users•Site availability based on SAM tests•Site Status Board
VO-specific applications:•ALICE Data Transfer Monitoring•ATLAS Data Management Monitoring•ATLAS Production Monitoring•Central Repository for Production Monitoring Data for CMS
![Page 13: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/13.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Development principles
Do not develop and deploy new sensors unless
nothing is in place for a given purpose
Where possible use common solutions (technology and implementation). All Dashboard applications regardless of their functionality and information sources are developed in the Dashboard framework
Involvement users in the development process
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 13
![Page 14: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/14.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 NEC09, Varna, Julia Andreeva (CERN, IT/GS) 14
Experiment Dashboard Framework
Information sources
UI
Data storage and
aggregation
Dashboard
Data C
ollecti
ng
Agents
DB Access
Layer (DAO)
DB Access Layer (DAO)
DB
Ac
ce
ss
L
ay
er
(DA
O)
Machine-readable format publisher
Other applications
Dashboard agents
System is modular This allows to have flexible approach while implementing needs of the customers
![Page 15: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/15.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 NEC09, Varna, Julia Andreeva (CERN, IT/GS) 15
Experiment Dashboard Framework (Examples)
Information sources
UI
Data storage and
aggregation
Dashboard
Data C
ollecti
ng
Agents
DB Access
Layer (DAO)
DB Access Layer (DAO)
DB
Ac
ce
ss
L
ay
er
(DA
O)
Machine-readable format publisher
Other applications
Dashboard agents
For ATLAS Data Management Monitoring all components are implemented
![Page 16: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/16.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 NEC09, Varna, Julia Andreeva (CERN, IT/GS) 16
Experiment Dashboard Framework (Examples)
Information sources
UI
Data storage and
aggregation
Dashboard
Data C
ollecti
ng
Agents
DB Access
Layer (DAO)
DB Access Layer (DAO)
DB
Ac
ce
ss
L
ay
er
(DA
O)
Machine-readable format publisher
Other applications
Dashboard agents
For CMS Production Monitoring Dashboard is used to store, aggregate and archive data and to publish it in XML format. While UI is developed by the CMS Production Team
![Page 17: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/17.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 NEC09, Varna, Julia Andreeva (CERN, IT/GS) 17
Experiment Dashboard Framework (Examples)
Information sources
UI
Data storage and
aggregation
Dashboard
Data C
ollecti
ng
Agents
DB Access
Layer (DAO)
DB Access Layer (DAO)
DB
Ac
ce
ss
L
ay
er
(DA
O)
Machine-readable format publisher
Other applications
Dashboard agents
For new SAM portal information is not imported into Dashboard DB. In SAM DB some additional tables are created and availability calculations are implemented inside ORACLE SAM instance.
Dashboard is used only to create monitoring display and to publish data in the machine-readable format
![Page 18: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/18.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Make users to take part in the development
Monitoring applications are successful when they are developed in the close collaboration with user community. Good examples are Site Status Board and Dashboard Site Availability application based on SAM tests CMS Experiment over last year put a lot of effort in site commissioning activity
Monitoring is a vital component of this processNew applications had been developed in close collaboration between Dashboard team and members of the CMS community involved in the site commissioning activityInitially developed for CMS, Dashboard Site Availability application had been requested by other LHC VOs . Now in production for all 4 LHC VOs.Same for Site Status Board , had been developed for CMS was later requested by ALICE and LHCb.
Dashboard plots demonstrating improvement of the quality of the sites used by CMS.
![Page 19: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/19.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Make users to take part in the development
Site Status Board •These are users (people taking part in computing shifts and site commissioning activity) who define the set of columns, their content, which metrics are considered for overall status of the site, what is the validity interval for a given metric, which columns should be shown in the UI by default , alternative views, etc…
•Dashboard provides a framework to be filled in by the customized information.
•Historical information as well as straight forward navigation to the primary information source is available
![Page 20: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/20.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
High level cross-VO view
The VO-specific monitoring systems are working in the scope of a single experiment
Non-expert users or users external to a given VO do not know how to find required information
It is difficult if at all possible to compare and correlate information of different VOs. Global cross-VO view is missing
Recent development aims to solve this problem The systems providing high level view are being
designed . They are based on integration of the experiment-specific monitoring systems, Dashboard framework and GridMap visualization system
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 20
![Page 21: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/21.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
GridMap visualization system
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 21
GridMap visualization tool had been developed in the context of CERN Openlab collaboration between CERN and EDS companyThe main motivation for GridMap development is to provide a high level view of the monitoring data collected from the distributed infrastructure in a intuitive and useful way. Perfect match of the requirements for visualization of the distributed hierarchical infrastructure and GridMap visualization
![Page 22: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/22.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Use-cases for GridMap
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 22
Multiple use cases had been defined for GridMap :•GridMap for Experiment Work Flows•GridMap for status of services defined as critical by the LHC
VOs•GridMap for Site Status Board
Analyzing the results of the CCRC08 and STEP09 one of the main conclusions was that sites are a bit disoriented regarding monitoring. Too many monitoring tools… Which ones to use? Which ones to trust? How to understand whether VOs served by the site are happy about site performance ?•Siteview application aims to provide estimation of site performance from the VO perspective•http://dashb-siteview.cern.ch/gridmap-vo-siteview/
![Page 23: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/23.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
High-level monitoring system for sites serving LHC VOs
ATLAS
ALICE
CMS
LHCb
•Central repository for common metrics•(transfer rate, parallel jobs, success rate, etc…)
Grid Map for a particular site
•Common metrics distributions by time
EGEE'08 - Julia Andreeva (CERN, IT/GS)
![Page 24: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/24.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Siteview(1/4)
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 24
Map is split in 4 groups:
•Overall status of the site from the VO perspective
•Job processing activity
•Incoming data transfer
•Outgoing data transfer
•Size of the cell is defined by the scale of a given activity, colour is defined by the success rate
![Page 25: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/25.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Siteview (2/4)
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 25
Assists users to navigate to the primary information source
![Page 26: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/26.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Siteview (3/4)
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 26
•Click to get more information about failures
![Page 27: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/27.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Siteview (4/4)
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 27
•Click to get to the primary information source
![Page 28: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/28.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Integration with Google Erath
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 28
Experiment specific monitoring systems provide input data
Dashboard agents publish this information in the KML format
Strong contribution to the development of Sergey Mitsyn (JINR)
Application will be shown during the LHC demo at the EGEE conference in Barcelona
![Page 29: VO-specific systems for the monitoring of the LHC computing activities on the GRID](https://reader036.vdocuments.mx/reader036/viewer/2022062422/568134c4550346895d9be7c2/html5/thumbnails/29.jpg)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Summary
NEC09, Varna, Julia Andreeva (CERN, IT/GS) 29
•Practical experience in operating Grid infrastructure and in using it by the LHC community (in particular during CCRC08, STEP09) proved that VO-specific monitoring systems are the vital part of the operations and are currently the main source of the monitoring information•Having a wide range of VO-specific monitoring tools in place, we were still missing the high level view of the computing activities for LHC experiments altogether both at the global and at the site level•This issues are being addressed in the current development.•Siteview application is being developed and evaluated by the LHC community