nagios conference 2013 - mike weber - distributed monitoring with raspberry pi
TRANSCRIPT
Distributed Monitoring
with Raspberry Pi
Mike Weber
The Problem: Remote Monitoring at Low Cost
Limited Service Checks
Limited Cost
Low Power Usage
Central Nagios Server
Low Tech Skills
Possible Solutions
Virtual Container
Requires VMWare etc.
Requires Expertise to Configure Nagios
Hardware
Cost
Resource Waste
Tech Skills Required (RAID, Nagios Config)
Passive Checks
Scripts on Hosts (more resources than compiled plugins)
Tech Skills
Possible Solutions: ITX
Mini-ITX ($400-600)
6.7 x 6.7 inch motherboard developed by VIA in 2001
Intel Atom 1.8 GHz Processor
2 GB of RAM
SSD
60 Watt Power Supply
Nano-ITX ($500-700)
4.7 x 4.7 inch motherboard developed by VIA in 2003
VIA 1.2 GHz Processor
1 GB of RAM
SSD
60 Watt Power Supply
Pico-ITX ($600-700)
3.9 x 2.8 inch motherboard developed by VIA in 2007
Raspberry Pi
Raspberry Pi
Low Cost
$75.00 (board, case, power supply)
Low Power Usage
Power Usage of a Cell Phone
Low Tech Skills
Clone Disks
Distributed Model
Flexible
Low Cost on Nagios Server
Pi: 512 RAM 700MHz
Installation of wheezy-raspbian
Download the image file which is about 500 MB: http://www.raspberrypi.org/downloads
Verify the Image sha1sum 2013-02-09-wheezy-raspbian.zip b4375dc9d140e6e48e0406f96dead3601fac6c81 2013-02-09-wheezy-raspbian.zip
Unzip the Imageunzip 2013-02-09-wheezy-raspbian.zip Archive: 2013-02-09-wheezy-raspbian.zip inflating: 2013-02-09-wheezy-raspbian.img
Username: pi Password: raspberry
Verify Disk Locationsu - fdisk -l
Disk /dev/sdd: 4102 MB, 4102889984 bytes 255 heads, 63 sectors/track, 498 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x295b8178
Device Boot Start End Blocks Id System /dev/sdd1 1 497 3992135+ b W95 FAT32
Create Diskdd bs=4M if=~/2012-10-28-wheezy-raspbian.img of=/dev/sdd
Network Configuration: Wireless
Edimax Wireless 802.11b/g/n (supports WPS,WPA2,802.1x)
* works out of the box
/etc/network/interfaces
auto lo
iface lo inet loopbackiface eth0 inet dhcp
allow-hotplug wlan0iface wlan0 inet dhcp wpa-ssid pi wpa-psk Pi89YQbg56)
Mod-Gearman
Why Mod-Gearman?
Distributes Tasks to Multiple Workers
Multiple Pi Workers
Supports Multiple Programming Languages
C, Java, Perl, PHP, Python, Shell
Provides a Distributed Model
Client Uses Very Small Resources
In Contrast to DNX Workers
Why Not DNX?
Not Currently Updated (2010-4-13)
Uses UDP (less dependable)
Client Uses More Resources
NEB: Nagios Event Broker
Mod-Gearman
Installation of Mod-Gearman on Pi
Install Prerequisitessudo apt-get update sudo apt-get install gearman mod-gearman-worker libgearman6 nagios-plugins
cd /etc/mod-gearman
Edit the worker.conf
sudo nano worker.conf
server=192.168.5.212:4730key=Modlinux23hosts=noservices=noeventhandlers=nomin-worker=6max-worker=8servicegroups=pi_srvlogfile=/var/log/mod_gearman/mod_gearman_worker.logp1_file=/usr/share/mod-gearman/mod_gearman_p1.pl
Save your changes and then start the Mod-Gearman worker:
sudo /etc/init.d/mod-gearman-worker start
Gearman Resource Usage
ps axo pid,ppid,pcpu,size,cmd|grep gearman
Process ParentCPUMemoryCMD 1747 1 0.0 1224 /usr/sbin/mod_gearman_worker 3255 1747 2.5 1488 /usr/sbin/mod_gearman_worker (working) 3256 1747 6.6 1488 /usr/sbin/mod_gearman_worker (working) 3257 1747 7.0 1488 /usr/sbin/mod_gearman_worker (working) 3258 1747 0.0 1356 /usr/sbin/mod_gearman_worker 3259 1747 0.0 1356 /usr/sbin/mod_gearman_worker 3260 1747 0.0 1356 /usr/sbin/mod_gearman_worker
size = virtual size of the process (code+data+stack)
Mod-Gearman Queues
Mod-Gearman
Worker Capacity
75-100 Service Checks
5 Minute Intervals
Compiled Plugins
6 Workers
2 Workers Always Available
Mod-Gearman Worker Configuration
Worker Identifier
Unique identifier for worker, hostname
min-worker
Minimum number of total workers
max-worker
Maximum number of total workers
idle-timeout
Time in seconds before idle worker exits
max-jobs
Maximum number of jobs before worker exits
Install Process
Install Nagios Event Broker
broker_module=/usr/local/lib/mod_gearman/mod_gearman.o
config=/etc/mod_gearman/mod_gearman_neb.conf
Install Server: gearmand
/etc/init.d/gearmand start
Install Worker: mod_gearman_worker
/etc/init.d/mod_gearman_worker start
Configuration File
/etc/mod_gearman/mod_gearman_neb.conf
Distributed Monitoring
Distributed Monitoring
Distributed Monitoring: Hostgroups
Server Configuration:
/etc/mod_gearman/mod_gearman_neb.conf
server=localhost:4730eventhandler=yesservices=yeshosts=yes
hostgroups=debian-serversencryption=yeskey=linux23_Qg549K
Pi Worker Configuration: /etc/mod-gearman/worker.conf
server=192.168.5.99:4730eventhandler=noservices=nohosts=nomin-worker=6max-worker=8
encryption=yeskey=linux23_Qg549K
p1_file=/usr/share/mod-gearman/mod_gearman_p1.pl
hostgroups=debian-servers
Distributed Monitoring: Servicegroups
Server Configuration:
/etc/mod_gearman/mod_gearman_neb.conf
server=localhost:4730eventhandler=yesservices=yeshosts=yes
servicegroups=pi_srvencryption=yeskey=linux23_Qg549K
Pi Worker Configuration: /etc/mod-gearman/worker.conf
server=192.168.5.99:4730eventhandler=noservices=nohosts=nomin-worker=6max-worker=8
encryption=yeskey=linux23_Qg549K
p1_file=/usr/share/mod-gearman/mod_gearman_p1.pl
servicegroups=pi_srv
Performance Tuning Pi
noatime
mtime
contents of file changed
ctime
inode changed (permissions,ownership)
atime
accessed time forces a write
/etc/fstab
proc /proc proc defaults 0 0
/dev/mmcblk0p1 /boot vfat defaults 0 2
/dev/mmcblk0p2 / ext4 defaults,noatime 0 1
mount -o remount /
Verify Changes with:
mount
Maximize Resources
Reduce Logging
* Turn Off rsyslog
* Minimize Logging
Shutdown Other Services
* mail server
Firewall Issues
Understanding Network Connections: Pi
tcp 0 0 192.168.5.47:43965 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43948 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43964 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43962 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43960 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43977 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43956 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43947 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43975 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43969 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43978 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43967 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43973 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43959 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43951 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43961 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43957 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43963 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43976 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43945 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43972 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43970 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43950 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43958 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43952 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43955 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43954 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43946 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43966 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43968 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43953 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43979 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43971 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43974 192.168.5.212:4730 ESTABLISHED
Understanding Network Connections: Nagios
tcp 0 0 0.0.0.0:4730 0.0.0.0:* LISTEN tcp 0 0 192.168.5.212:4730 192.168.5.47:44254 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44258 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44257 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44255 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44259 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44253 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44256 ESTABLISHED
Creating Checks
Create Service Check
Create Servicegroup
Add Services to Servicegroup
Graphing with Pi Checks
Monitoring Pi
Monitor Pi: Workers and Jobs
Create a Script on Nagios to Monitor Workers and Jobs
#!/bin/bashcheck_gearman -H 192.168.5.99 -q worker_raspberrypi -t
10 -s check
Monitor Pi: Service Check
Monitor Gearman Workers
Monitor Gearman Workers/Jobs
Warning Signals
Nagios Server: Check Latency
Nagios Server: Orphaned Checks
service check orphaned, is the mod-gearman worker on queue
'servicegroup_pi' running?
Pi: Load Over 1
1= 100%
Pi: Defunct Workers
15824 14129 2.1 0 [mod_gearman_wor]
Pi: Overloaded
Load Approaching Limit
ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep -v grep
pid ppid pcpu size cmd14129 1 0.0 1224 /usr/sbin/mod_gearman_worker
15634 14129 12.0 1488 /usr/sbin/mod_gearman_worker 15635 14129 12.0
1488 /usr/sbin/mod_gearman_worker 15636 14129 12.0 1488
/usr/sbin/mod_gearman_worker 15637 14129 13.0 1488
/usr/sbin/mod_gearman_worker 15638 14129 12.0 1488
/usr/sbin/mod_gearman_worker 15639 14129 12.0 1488
/usr/sbin/mod_gearman_worker15640 14129 12.0 1488
/usr/sbin/mod_gearman_worker 15641 14129 11.0 1488
/usr/sbin/mod_gearman_worker15642 14129 11.0 1488
/usr/sbin/mod_gearman_worker
Increased CPU Usage Indicating Impending DOOM
ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep -v grep
pid ppid pcpu size cmd14129 1 0.0 1224 /usr/sbin/mod_gearman_worker
15658 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15659 14129 2.1
1488 /usr/sbin/mod_gearman_worker 15660 14129 2.1 1488
/usr/sbin/mod_gearman_worker 15661 14129 2.1 1488
/usr/sbin/mod_gearman_worker 15662 14129 2.1 1488
/usr/sbin/mod_gearman_worker 15663 14129 2.1 1488
/usr/sbin/mod_gearman_worker 15664 14129 21.0 1488
/usr/sbin/mod_gearman_worker 15665 14129 21.0 1488
/usr/sbin/mod_gearman_worker 15666 14129 21.0 1488
/usr/sbin/mod_gearman_worker
Plugin Resource Usage: RAM
Plugin Resource Use: Time
Example: check_ping
PID PPID CPU RAM Time Command 12106 12105 0.0 280 00:01 25
/usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80%
-c 5000.0,100% -p 5
12106 12105 0.0 280 00:02 25 /usr/lib/nagios/plugins/check_ping
-H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5
12106 12105 0.0 280 00:03 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5
Plugins Resource Hog: Network Bandwidth
CPU RAM Time Plugin13.0 7696 00:01 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 6.5 7696 00:02 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 4.3 7696 00:03 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 3.2 7696 00:04 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 2.6 7696 00:05 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 2.1 7696 00:06 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.8 7696 00:07 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.6 7696 00:08 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.4 7696 00:09 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.3 7696 00:10 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl
Latency Evaluation
Turn On Debug=1
[2013-08-20 10:24:36][11574][DEBUG] received job for queue
servicegroup_pi_srv: centos - FTP[2013-08-20
10:24:36][11574][DEBUG] service: 'centos' - 'FTP', next_check is at
2013-08-20 10:24:36, latency so far: 0
[2013-08-20 10:25:17][11574][DEBUG] received job for queue servicegroup_pi_srv: centos - HTTP[2013-08-20 10:25:17][11574][DEBUG] service: 'centos' - 'HTTP', next_check is at 2013-08-20 10:25:17, latency so far: 0[2013-08-20 10:25:17][11574][DEBUG] service job completed: centos HTTP: 2
Troubleshooting: Return code 127
CRITICAL: Return code of 127 is out of bounds. Make sure the
plugin you're trying to run actually exists. (worker:
raspberrypi)
Check the Path to the plugins directory.
sudo mkdir -p /usr/local/nagiossudo ln -s /usr/lib/nagios/plugins
/usr/local/nagios/libexec
Questions?
Memory in MB
DNX Worker204
Mod-Gearman Worker3.29
CompiledNSCANSClient++SSHPerl
RAM0.0420.0460.30.5210.05
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
Ninth Outline Level
Click to edit the title text format
2013
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level