writing nagios plugins in python
DESCRIPTION
I introduced Nagios to an organisation in 2004 to track the availability of various servers and network resources. It has since grown into a system validity tool that takes the stress out of help desk. Using Python as a scripting language, I have created a suite of additional Nagios plugins that ensures: * real-time entry of market rates * end of day rate integrity * common errors in manual spreadsheets * success of backup processes * validity conditions in MS SQL databases * routine tracking of known chronic errorsTRANSCRIPT
Enhancing Nagioswith Python Plugins
Maurice ManeschiAssociate Director, Risk Management Systems
Oakvale Capital Limited
Presentation Outline
● Risk Management Systems● What is Nagios● Why Python● What is a plug in● Specific Risks being monitored● Analysing reports and logs● Where to next
Risk Management Systems
● A division of five staff● Supporting three key applications● Running on eight servers● Depending on 15+ other boxes spread over 3 LANs● Five key vendors
Risk Management System
● Divisional goals
– Key goal is application management
– Some customer support
– Product innovation
– Project management
– No time for nasty surprises
What is Nagios
● Host, service, network monitoring program● Open source● Written in C● Runs on Linux and Apache
What is Nagios
● Configured with the hosts of a network
– How the hosts are networked
– What key services are on the hosts● “PING”, SMTP, HTTP etc.
● Application polls these at specified intervals
– From the results of the polls, determines the state of hosts, services and networks
– Alerts sent by email
– Escalation, reporting, statistics and more
Why Python
● Flexible● Efficient● Managable● Numerous, diverse libraries● Cross-platform● Huge number of code samples across the network
What is a plugin
● Executable file
– Takes parameters (preferable)
– Prints a short status message● Returns an exit status of
– 0 – all OK
– 1 – warning
– 2 – critical● Stateless
What is a plugin
● Executable Python script
● Code the test● Print the status line● Return a status● Easy!
Specific risks being monitored
● Customer email to the help desk system has stopped
– User issues email in directly into our help desk system for prioritisation, action and eventually billing
– Spam periodically breaks the import agent
– Its proprietary, so no fix in sight
– Nagios watches the queue using POP3
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● Ratefeed is missing some rates
– Rates feed into our system from Reuters via MS Excel
– Some rates are critical, and human intervention is required if they are missing
– Other rates are important, but are just tracked when missing
– Nagios watches MS Excel file sheet with the “unreliable rates”
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● Rates must be inserted regularly
– Insertion process has numerous dependencies
– Moving target – causes of failure change over time
– Focus on the end point – are the rates in the database?
– Nagios the databases and alerts to old or missing rates
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● External source of dealing information
– Fed in through the FIX protocol
– Numerous failure points being monitored on a (Windows) server
– Monitor process must check in with Nagios every 10 minutes
– Using passive and active checks
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● Quick passive check
Specific risks being monitored
● Successful backups● Successful scheduled tasks● Database comparisons● Common errors
– Password server on web site
– Known failure point on an MS Excel worksheet
Extra enhancements to Nagios
● High level view to systems health● Audio alerts and SMSes from UTbox.net● Status screen on monitor PC● Syslogd for firewall● Script reuse for rate checks● Ad hoc system problems
– Currently tracking WAN failures
Analysing reports and logs
● Screen saver often sufficient● Summary views
Where to next
● Low spec-ed PC● Nagios is in several distro repositories
– I compile from the source● Allow a day at least to configure Nagios
– Don't expect to install and switch it on● Tuning Nagios is an ongoing job
Further information
● Nagios: http://www.nagios.org● Python: http://www.python.org
– pyexcelerator, pymssql, freetds from Sourceforge● Oakvale Capital: http://www.oakvale.com● Code samples:
http://www.redwaratah.com/wiki/index.php?title=Nagios_and_Python● Maurice Maneschi: [email protected]