![Page 1: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/1.jpg)
Automated System Monitoring
Josh [email protected]
Systems AdministratorNational Radio Astronomy Observatory
Charlottesville, VA
2
https://blogs.nrao.edu/jmalone
![Page 2: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/2.jpg)
WHAT IS AUTOMATED MONITORING?
7
![Page 3: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/3.jpg)
Automated Monitoring Workflow
8
![Page 4: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/4.jpg)
Monitoring Packages: Open Source•
• Opsview Core
•
•
•
• Core
• Pandora FMS
• Naemon
•
•
• Captialware ServerStatus
• Sensu
9
All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here.
![Page 5: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/5.jpg)
Monitoring Packages: Commercial• Nagios XI
• Groundwork
• PRTG network monitor
• CopperEgg
• WhatsUp Gold
• PRTG network monitor
• op5 (Naemon)
• Sensaphone (IMS 4000)
• Statseeker
10
All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here.
![Page 6: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/6.jpg)
What can monitoring do for you?• Spot small problems before they become big ones• Checklist when restoring from a power outage• Learn about outages before your users do• Gives you better problem reports than users• Problems you might never spot otherwise
• Failed HDDs in RAIDs• Full /var partitions• Logs not rotating• System temperature rising
11
![Page 7: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/7.jpg)
With Monitoring
•dhcp out of leases•dhcp server down•dns server not responding
•ethernet switch down• ISP link down / saturated
12
Without Monitoring
“The Internet’s down - fix it!!!”
![Page 8: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/8.jpg)
With Monitoring
•connectivity issues•web server down•apache not running•web server disk full• server load too high
13
Without Monitoring
“ZOMG! Our web site is down! O Noes!!!”
![Page 9: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/9.jpg)
What can monitoring do for you?• Capacity planning
• Performance data can generate graphs of utilization• RAM, Disk, etc.
• Availability reports - CAUTION• Easy to generate -- even easier to generate wrong• Make sure your configurations actually catch problems• Will also include problems with Nagios itself :(• If you’re going to quote your availability numbers (SLAs,
etc.) make sure you understand what you’re actually monitoring.
14
![Page 10: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/10.jpg)
ENVIRONMENT MONITORING
15
![Page 11: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/11.jpg)
Environment Monitoring• Temperature• Smoke• Water• Humidity• Motion• Door / closure• Mains power
16
![Page 12: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/12.jpg)
Environment Monitoring• Sensaphone IMS-4000• Connect sensors to
measure desired metrics• IP-based “Nodes” can
connect remote sensors• Wireless sensors available• Notification via POTS line
and voice dialer as well as email
• SNMP support
17
Use my plugin w/ Nagios!
![Page 13: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/13.jpg)
Environment Monitoring• ServersCheck
• Temp, Humidity• Wireless (2.4GHz)
• NetBotz• Temp, humidity, smoke, water,
vibration, doors, cameras
18
![Page 14: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/14.jpg)
NAGIOS
19
![Page 15: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/15.jpg)
• Open source host / service monitoring package
• “Nagios Ain't Gonna Insist On Sainthood”
• Originally released in 1999 as “NetSaint”
• Available in 2 versions: Core and XI
• Nagios Core: Open-source, freely available
• Nagios XI: Commercial• Free license for up to 7 hosts• Available as source installer or VMware appliance
20
Nagios
![Page 16: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/16.jpg)
Nagios Architecture
21
![Page 17: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/17.jpg)
What’s a plugin?• Plugins actually run the service or host checks.• Each plugin monitors a different type of service• Data from plugin is communicated to Nagios using a (very)
simple API• Plugins can also report “Performance Data” (perfdata) to be
graphed or tracked• Requires a perfdata add-on (or Nagios XI)
• Plugins can be written in any language• Perl plugins can run using Nagios’s embedded perl
interpreter for increased performance
22
![Page 18: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/18.jpg)
Where to Monitor a Service?
Host ping
TCP port 443
SSL handshake
HTTP return code
Page load time
Page content
23
Is server host alive?
Is Apache listening?
Is SSL functional?
Is the page found?
Does page load quickly?
Is it the right page?
![Page 19: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/19.jpg)
Custom Plugins
24
• Nagios can monitor anything you can write a script to check• Simple API: just write text to stdout and exit with a value• You can write plugins in ANY language you choose!
• bash, python, tcl, expect• perl (Nagios has embedded perl interpreter for speed)• C, C++
• Huge collection of plugins available at:http://exchange.nagios.orghttps://www.monitoringexchange.org
• Be wary of some community plug-ins!• Test first!!!
![Page 20: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/20.jpg)
Performance Data• Metrics about the state of the service• Can be used to generate graphs showing trends, etc.• Performance data processing requires some external add-on
like PNP4Nagios
25
![Page 21: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/21.jpg)
My Plugins Framework• https://github.com/48kRAM/nagios-plugins• Perl• Net::SNMP
• Plugin for APC Smart-UPS,
26
![Page 22: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/22.jpg)
Agent-less vs Agent-full ChecksAgent-less
• No agent installed on the monitored host
• All check plugins run on the monitoring server
• Service to be monitored must be network-accessible
• Default mode of Nagios
Agent-full
• Must install agent on server to be monitored
• Check logic runs on monitored host
• Can access services non-network services
• SNMP can be a powerful agent for checks
• Server-specific agents
27
![Page 23: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/23.jpg)
USING NAGIOS
28
![Page 24: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/24.jpg)
About Nagios Replacements
29
When Nagios went commercial, the “open-source community” decided that it needed not one, not two, but three replacements for Nagios: Icinga and Naemon (forks of Nagios) and Shinken (a drop-in replacement). Most linux distros are now shipping one or more of these compatible replacements rather than the official Nagios Core. Not a single distro I checked is shipping Nagios 4.
Either Shinken, Naemon or Icinga should work the same as the material covered here, but I have only briefly tested Icinga and have not tested Shinken or Naemon at all.
![Page 25: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/25.jpg)
Overview
30
Navbar Main window
![Page 26: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/26.jpg)
The Tactical Overview• Displays overview of monitored services and hosts• Shows if
• Any services / hosts have notifications disabled• Any services / hosts are flapping• Active / passive checks enabled / disabled• Warning / Critical / Okay breakdown
31
![Page 27: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/27.jpg)
The Tactical Overview
32
![Page 28: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/28.jpg)
Services View
33
Host summary Service summary
![Page 29: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/29.jpg)
Click on Services - Critical
34
![Page 30: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/30.jpg)
Host and Service Groups• Organize services or hosts into groups by function, etc.• Can disable alerts, schedule downtime, etc. on whole group• Can show availability report for a whole group
• Group services by desired reporting capability• Groups get a unique URL so you can send a single link to
check on a group of hosts• Great for PHBs!• Also great for delegated IT departments
35
![Page 31: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/31.jpg)
Service Groups
36
![Page 32: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/32.jpg)
Acknowledging an Outage• Click on service name (or hostname) that has the problem• Under “Service Commands”
• Click “Acknowledge this service problem”• You must enter a comment about why you are acknowledging
the problem (i.e., “Bob is working on it”)• Click “Commit”
37
![Page 33: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/33.jpg)
Acknowledging an Outage
38
Click Here
![Page 34: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/34.jpg)
Acknowledging an Outage
39
![Page 35: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/35.jpg)
SMS pages• Configure a contact to use
an email-to-sms gateway• Some carriers require an
MMS gateway to process the ‘From’ address
40
![Page 36: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/36.jpg)
Add-ons to Consider• PNP4Nagios - Performance data graphing• NConf - Web-based configurator for Hosts, Services, etc.• NagiosQL - Web-based admin tool for Nagios• NDOUtils - Export data from Nagios to MySQL
41
![Page 37: Automated System Monitoring - National Radio …jmalone/talks/LSP_monitoring.pdf · Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy](https://reader033.vdocuments.mx/reader033/viewer/2022051722/5aa4ff937f8b9ab4788c981d/html5/thumbnails/37.jpg)
THANK YOU!
42
Previous talks available at:
https://blogs.nrao.edu/jmalone/talks/