tier3 monitoring. initial issues

9
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Upload: trula

Post on 06-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR. What is Tier3 sites. Tier 3 site – is a computing facility, which using some local groups for their analysis work. Working definition “Non pledged resources” “Analysis facilities” at Your University/Institute/... - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tier3  monitoring. Initial issues

Tier3 monitoring. Initial issues.

Danila Oleynik. Artem Petrosyan.

JINR.

Page 2: Tier3  monitoring. Initial issues

What is Tier3 sites.

Tier 3 site – is a computing facility, which using some local groups for their analysis work. Working definition

– “Non pledged resources”– “Analysis facilities” at Your University/Institute/...

Specialty:– Final analysis vs simulation and reconstruction – Local control vs ATLAS central control – Operation load more on local resources (i.e. people)

than on the central team (i.e. other people)

Page 3: Tier3  monitoring. Initial issues

Types of Tier 3’s

• Tier 3 gs (grid services)– Same services as Tier2

• Tier 3 w (workstation)– Interactive workstation with Atlas Software– No batch system– Can submit grid jobs– Data retrieved using client tools (dq2-get)

Page 4: Tier3  monitoring. Initial issues

Tier 3G (most common Tier 3)

• Interactive nodes• Can submit grid jobs• Batch system• Atlas code available• Client tools used for fetch data (dq2-ls, dq2-get)• Storage can be one of two types:

– Located on the worker nodes– Located on dedicated file servers

Page 5: Tier3  monitoring. Initial issues

T3g Software Services

• Generic Services (services which are used to maintain the cluster)– LDAP is a database, which, in this case, is used to

manage the users on the cluster.– Ganglia  infrastructure monitoring system.– Web server.– NFS (network file system).

• Job submission services– Proof (most common)– Arcond (More exotic)

• Distributed Storage Service– XrootD 

Page 6: Tier3  monitoring. Initial issues

Monitoring issues for Tier3

• Local monitoring• Infrastructure monitoring• Monitoring of Job submission system• Monitoring of Data management (storage) system

– Requirements: easy installation and support

• Global monitoring (monitoring of all Tier3 activities)

• We still have no full picture about which data should be represented on this layer

Page 7: Tier3  monitoring. Initial issues

Local Tier3 Monitoring

‘Ganglia’ and ‘Nagios’ – is a most recommend system for infrastructure monitoring (a lot of Tier3 sites in US already use Ganglia). This system provides wide range of monitoring parameters, significant advantage of this systems that they boiled by plug-in technology and for monitoring specific parameters only sensors needed to be developed.

Most of Tier3 sites will use Proof as job management system and xRootd as data management system. This systems has initial interfaces for thus monitoring. Proof and xRootd can be monitored by Monalisa monitoring system, but we collect different views about Monalisa. Monalisa can be very heavy solution for small sites, but we still in investigation.

Page 8: Tier3  monitoring. Initial issues

Global Tier3 monitoring

• Due to we have no information about data that should be presented in global monitoring from tier3 sites and data flow, we can talk only about initial concepts of this service.

• This service should based on agent model. Each agent works with local monitoring system, collect aggregate needful data and send this data to some central monitoring service. In depends from dataflow different technology can be implemented (REST - WebServices, ActiveMQ). Central service should provide possibility to collect and store this information from all sites. Provide different interfaces to data, as human oriented so machine oriented.

Page 9: Tier3  monitoring. Initial issues

Interfaces for Monitoring Tier3 system.

• Local monitoring system. Ganglia and Nagios provide own web interfaces. All new parameters which can be collected by these systems will be presented through their interfaces.

• Global monitoring system. For presentation of monitored data on global level we offer to use web based technology. In development of this application we are ready to use Ajax technology (Jquery) and Jango as data model layer. Integration with other application can be done by using REST protocol.