simply monitor a grid site with nagios

17
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J. Casey, CERN E. Imamagic, SRCE ISGC 2008

Upload: glenys

Post on 14-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Simply monitor a grid site with Nagios. J. Casey, CERN E. Imamagic, SRCE ISGC 2008. Overview. Nagios Nagios-based grid monitoring Site monitoring prototype Demo Current status Future work Conclusions. Nagios. Open source monitoring framework widely used & actively developed - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simply monitor a grid site with Nagios

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Simply monitor a grid site with NagiosJ. Casey, CERN

E. Imamagic, SRCE

ISGC 2008

Page 2: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Overview

• Nagios• Nagios-based grid monitoring• Site monitoring prototype• Demo• Current status• Future work• Conclusions

Page 3: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Nagios

• Open source monitoring framework– widely used & actively developed

• Host and service problems detection and recovery• Provides wide set of basic sensors

– easy to develop custom sensors

• Centralized vs. distributed deployment• High configurability

– service dependencies, fine-grained notification options

• Web interface– status view, administration

Page 4: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Nagios-based Grid Monitoring

• Monitoring CRO-GRID Infrastructure (2004-2006)– Globus Toolkit Pre-WS & WS, UNICORE, other services– active recovery of services– http://www.cro-ngi.hr

• Monitoring EGEE resources in Central Europe (CE)– core services since mid 2006– all CE sites for 1st line support since September 2006– http://nagios.ce-egee.org

• Grid Services Monitoring (GSM) WG– site monitoring prototype, mid 2007– http://crnjak.srce.hr/nagios (egee.srce.hr)– https://pps-monitoring.cern.ch/nagios (CERN-PPS)

Page 5: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Site Monitoring Prototype

Nagios

…Site

nodes

Site BDII CE SE LFC

NCG

NPM

Remote gatherers

Standard probes

Credential refresh

MyProxy

VOMS

Long-lived MyProxy certificate

Nagios configuration

Refresh

proxy

Get VOMS proxy

Probe wrapper

VOMS proxy certificate

Service checks

Get remote results

Probe descriptions

SAM

Get site’s & nodes

information

Get nodes information

Live node checks

Publisher

External applications

Get Nagios results

Nagios web

interface

Site admins

Get site status

Issue alarms

Monitoring server

Page 6: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid Probes

• Provided by SRCE, CERN, OSG• Security facilities & services

– CA distribution, Certificate lifetime, MyProxy

• Monitoring & information services – R-GMA, BDII, MDS, GridICE

• Job management services – Globus Gatekeeper, RB, WMS, WMProxy, Job matching

• File management services – GridFTP, SRM, DPNS, LFC, FTS

Page 7: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Standard Components

• Specifications defined by GSM WG• Probe wrapper

– enables integration of standardized probes– Grid Monitoring Probes Specification– https://twiki.cern.ch/twiki/bin/view/LCG/

GridMonitoringProbeSpecification

• Publisher & remote gatherers– integration with other tools– Grid Monitoring Data Exchange Standard– https://twiki.cern.ch/twiki/bin/view/LCG/

GridMonitoringDataExchangeStandard

Page 8: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Nagios Config Generator

• Uses multiple information sources– SAM, BDII, active heuristic checks

• Modular approach– plugging in additional information sources– integration with other monitoring systems (e.g. LEMON)

• User-defined rules– configuration tuning for non-standard grid sites

• Standalone configuration– integration with existing Nagios server

Page 9: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Remote gLite UI

• Avoid installation of grid middleware on Nagios server– execute grid probes on existing gLite UI– use Nagios Remote Plugin Executor (NRPE)

…Site

nodes

Site BDII CE SE LFC

Nagios

Nagios server

Standard probes

gLite UI

NRPE

Service checks

Page 10: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Page 11: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Page 12: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Page 13: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SAM

Standard probes

NPM

Page 14: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Current Status

• Three sets of standard probes integrated– SRCE, CERN, OSG

• Two external monitoring systems– SAM, ENOC DownCollector

• Several deployments– CERN-PPS, SRCE, NIKHEF, PIC, IN2P3, ScotGrid

• RPMs in apt and yum repository• Installation and configuration manual• More info

https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo

Page 15: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Future Work

• NCG development– providing configuration for multiple sites (regional monitoring)– providing configuration for multiple VOs

• Integration with global monitoring systems– ActiveMQ messaging system– Operations Automation Team mandate

• Enabling “on-host” check via NRPE– process, logs, ports, files, etc

• Probe description & site topology databases definition

Page 16: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Conclusions

• Nagios – highly configurable monitoring framework with notifications,

service dependencies, …– widely used by site admins

• Grid extensions– integration with existing infrastructure (user certificates, VOMS,

GOCDB, SAM)– probes for key grid services

• Implementation of GSM WG specifications– probe wrapper, publisher & remote gatherers– easy integration with existing probes and monitoring systems

Page 17: Simply monitor a grid site with Nagios

ISGC 2008 / Simply monitor a grid site with Nagios 17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Thank You!

Questions?