nagios conference 2013 - mike weber - distributed monitoring with raspberry pi

Download Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi

If you can't read please download the document

Upload: nagios

Post on 16-Apr-2017

2.655 views

Category:

Technology


4 download

TRANSCRIPT

Distributed Monitoring
with Raspberry Pi

Mike Weber

[email protected]

The Problem: Remote Monitoring at Low Cost

Limited Service Checks

Limited Cost

Low Power Usage

Central Nagios Server

Low Tech Skills

Possible Solutions

Virtual Container
Requires VMWare etc.
Requires Expertise to Configure Nagios

Hardware
Cost
Resource Waste
Tech Skills Required (RAID, Nagios Config)

Passive Checks
Scripts on Hosts (more resources than compiled plugins)
Tech Skills

Possible Solutions: ITX

Mini-ITX ($400-600)
6.7 x 6.7 inch motherboard developed by VIA in 2001
Intel Atom 1.8 GHz Processor
2 GB of RAM
SSD
60 Watt Power Supply

Nano-ITX ($500-700)
4.7 x 4.7 inch motherboard developed by VIA in 2003
VIA 1.2 GHz Processor
1 GB of RAM
SSD
60 Watt Power Supply

Pico-ITX ($600-700)
3.9 x 2.8 inch motherboard developed by VIA in 2007

Raspberry Pi

Raspberry Pi

Low Cost
$75.00 (board, case, power supply)

Low Power Usage
Power Usage of a Cell Phone

Low Tech Skills
Clone Disks

Distributed Model
Flexible
Low Cost on Nagios Server

Pi: 512 RAM 700MHz

Installation of wheezy-raspbian

Download the image file which is about 500 MB: http://www.raspberrypi.org/downloads

Verify the Image sha1sum 2013-02-09-wheezy-raspbian.zip b4375dc9d140e6e48e0406f96dead3601fac6c81 2013-02-09-wheezy-raspbian.zip

Unzip the Imageunzip 2013-02-09-wheezy-raspbian.zip Archive: 2013-02-09-wheezy-raspbian.zip inflating: 2013-02-09-wheezy-raspbian.img

Username: pi Password: raspberry
Verify Disk Locationsu - fdisk -l

Disk /dev/sdd: 4102 MB, 4102889984 bytes 255 heads, 63 sectors/track, 498 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x295b8178

Device Boot Start End Blocks Id System /dev/sdd1 1 497 3992135+ b W95 FAT32

Create Diskdd bs=4M if=~/2012-10-28-wheezy-raspbian.img of=/dev/sdd

Network Configuration: Wireless

Edimax Wireless 802.11b/g/n (supports WPS,WPA2,802.1x)
* works out of the box

/etc/network/interfaces

auto lo

iface lo inet loopbackiface eth0 inet dhcp

allow-hotplug wlan0iface wlan0 inet dhcp wpa-ssid pi wpa-psk Pi89YQbg56)

Mod-Gearman

Why Mod-Gearman?

Distributes Tasks to Multiple Workers
Multiple Pi Workers

Supports Multiple Programming Languages
C, Java, Perl, PHP, Python, Shell

Provides a Distributed Model

Client Uses Very Small Resources
In Contrast to DNX Workers

Why Not DNX?

Not Currently Updated (2010-4-13)

Uses UDP (less dependable)

Client Uses More Resources

NEB: Nagios Event Broker

Mod-Gearman

Installation of Mod-Gearman on Pi

Install Prerequisitessudo apt-get update sudo apt-get install gearman mod-gearman-worker libgearman6 nagios-plugins

cd /etc/mod-gearman

Edit the worker.conf

sudo nano worker.conf

server=192.168.5.212:4730key=Modlinux23hosts=noservices=noeventhandlers=nomin-worker=6max-worker=8servicegroups=pi_srvlogfile=/var/log/mod_gearman/mod_gearman_worker.logp1_file=/usr/share/mod-gearman/mod_gearman_p1.pl

Save your changes and then start the Mod-Gearman worker:

sudo /etc/init.d/mod-gearman-worker start

Gearman Resource Usage

ps axo pid,ppid,pcpu,size,cmd|grep gearman

Process ParentCPUMemoryCMD 1747 1 0.0 1224 /usr/sbin/mod_gearman_worker 3255 1747 2.5 1488 /usr/sbin/mod_gearman_worker (working) 3256 1747 6.6 1488 /usr/sbin/mod_gearman_worker (working) 3257 1747 7.0 1488 /usr/sbin/mod_gearman_worker (working) 3258 1747 0.0 1356 /usr/sbin/mod_gearman_worker 3259 1747 0.0 1356 /usr/sbin/mod_gearman_worker 3260 1747 0.0 1356 /usr/sbin/mod_gearman_worker

size = virtual size of the process (code+data+stack)

Mod-Gearman Queues

Mod-Gearman

Worker Capacity

75-100 Service Checks
5 Minute Intervals

Compiled Plugins

6 Workers
2 Workers Always Available

Mod-Gearman Worker Configuration

Worker Identifier
Unique identifier for worker, hostname

min-worker
Minimum number of total workers

max-worker
Maximum number of total workers

idle-timeout
Time in seconds before idle worker exits

max-jobs
Maximum number of jobs before worker exits

Install Process

Install Nagios Event Broker
broker_module=/usr/local/lib/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf

Install Server: gearmand
/etc/init.d/gearmand start

Install Worker: mod_gearman_worker
/etc/init.d/mod_gearman_worker start

Configuration File
/etc/mod_gearman/mod_gearman_neb.conf

Distributed Monitoring

Distributed Monitoring

Distributed Monitoring: Hostgroups

Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf

server=localhost:4730eventhandler=yesservices=yeshosts=yes
hostgroups=debian-serversencryption=yeskey=linux23_Qg549K

Pi Worker Configuration: /etc/mod-gearman/worker.conf

server=192.168.5.99:4730eventhandler=noservices=nohosts=nomin-worker=6max-worker=8
encryption=yeskey=linux23_Qg549K
p1_file=/usr/share/mod-gearman/mod_gearman_p1.pl
hostgroups=debian-servers

Distributed Monitoring: Servicegroups

Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf

server=localhost:4730eventhandler=yesservices=yeshosts=yes
servicegroups=pi_srvencryption=yeskey=linux23_Qg549K

Pi Worker Configuration: /etc/mod-gearman/worker.conf

server=192.168.5.99:4730eventhandler=noservices=nohosts=nomin-worker=6max-worker=8
encryption=yeskey=linux23_Qg549K
p1_file=/usr/share/mod-gearman/mod_gearman_p1.pl
servicegroups=pi_srv

Performance Tuning Pi

noatime

mtime
contents of file changed

ctime
inode changed (permissions,ownership)

atime
accessed time forces a write

/etc/fstab
proc /proc proc defaults 0 0
/dev/mmcblk0p1 /boot vfat defaults 0 2
/dev/mmcblk0p2 / ext4 defaults,noatime 0 1

mount -o remount /

Verify Changes with:
mount

Maximize Resources

Reduce Logging
* Turn Off rsyslog
* Minimize Logging

Shutdown Other Services
* mail server

Firewall Issues

Understanding Network Connections: Pi

tcp 0 0 192.168.5.47:43965 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43948 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43964 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43962 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43960 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43977 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43956 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43947 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43975 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43969 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43978 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43967 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43973 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43959 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43951 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43961 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43957 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43963 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43976 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43945 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43972 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43970 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43950 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43958 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43952 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43955 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43954 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43946 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43966 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43968 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43953 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43979 192.168.5.212:4730 ESTABLISHEDtcp 0 0 192.168.5.47:43971 192.168.5.212:4730 TIME_WAIT tcp 0 0 192.168.5.47:43974 192.168.5.212:4730 ESTABLISHED

Understanding Network Connections: Nagios

tcp 0 0 0.0.0.0:4730 0.0.0.0:* LISTEN tcp 0 0 192.168.5.212:4730 192.168.5.47:44254 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44258 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44257 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44255 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44259 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44253 ESTABLISHED tcp 0 0 192.168.5.212:4730 192.168.5.47:44256 ESTABLISHED

Creating Checks

Create Service Check

Create Servicegroup

Add Services to Servicegroup

Graphing with Pi Checks

Monitoring Pi

Monitor Pi: Workers and Jobs

Create a Script on Nagios to Monitor Workers and Jobs

#!/bin/bashcheck_gearman -H 192.168.5.99 -q worker_raspberrypi -t 10 -s check

Monitor Pi: Service Check

Monitor Gearman Workers

Monitor Gearman Workers/Jobs

Warning Signals

Nagios Server: Check Latency

Nagios Server: Orphaned Checks
service check orphaned, is the mod-gearman worker on queue 'servicegroup_pi' running?

Pi: Load Over 1
1= 100%

Pi: Defunct Workers
15824 14129 2.1 0 [mod_gearman_wor]

Pi: Overloaded

Load Approaching Limit

ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep -v grep
pid ppid pcpu size cmd14129 1 0.0 1224 /usr/sbin/mod_gearman_worker 15634 14129 12.0 1488 /usr/sbin/mod_gearman_worker 15635 14129 12.0 1488 /usr/sbin/mod_gearman_worker 15636 14129 12.0 1488 /usr/sbin/mod_gearman_worker 15637 14129 13.0 1488 /usr/sbin/mod_gearman_worker 15638 14129 12.0 1488 /usr/sbin/mod_gearman_worker 15639 14129 12.0 1488 /usr/sbin/mod_gearman_worker15640 14129 12.0 1488 /usr/sbin/mod_gearman_worker 15641 14129 11.0 1488 /usr/sbin/mod_gearman_worker15642 14129 11.0 1488 /usr/sbin/mod_gearman_worker

Increased CPU Usage Indicating Impending DOOM
ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep -v grep
pid ppid pcpu size cmd14129 1 0.0 1224 /usr/sbin/mod_gearman_worker 15658 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15659 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15660 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15661 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15662 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15663 14129 2.1 1488 /usr/sbin/mod_gearman_worker 15664 14129 21.0 1488 /usr/sbin/mod_gearman_worker 15665 14129 21.0 1488 /usr/sbin/mod_gearman_worker 15666 14129 21.0 1488 /usr/sbin/mod_gearman_worker

Plugin Resource Usage: RAM

Plugin Resource Use: Time

Example: check_ping


PID PPID CPU RAM Time Command 12106 12105 0.0 280 00:01 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5

12106 12105 0.0 280 00:02 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5

12106 12105 0.0 280 00:03 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5

Plugins Resource Hog: Network Bandwidth

CPU RAM Time Plugin13.0 7696 00:01 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 6.5 7696 00:02 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 4.3 7696 00:03 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 3.2 7696 00:04 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 2.6 7696 00:05 20 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 2.1 7696 00:06 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.8 7696 00:07 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.6 7696 00:08 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.4 7696 00:09 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl 1.3 7696 00:10 15 /usr/bin/perl -w? /usr/lib/nagios/plugins/check_iftraffic3.pl

Latency Evaluation

Turn On Debug=1

[2013-08-20 10:24:36][11574][DEBUG] received job for queue servicegroup_pi_srv: centos - FTP[2013-08-20 10:24:36][11574][DEBUG] service: 'centos' - 'FTP', next_check is at 2013-08-20 10:24:36, latency so far: 0

[2013-08-20 10:25:17][11574][DEBUG] received job for queue servicegroup_pi_srv: centos - HTTP[2013-08-20 10:25:17][11574][DEBUG] service: 'centos' - 'HTTP', next_check is at 2013-08-20 10:25:17, latency so far: 0[2013-08-20 10:25:17][11574][DEBUG] service job completed: centos HTTP: 2

Troubleshooting: Return code 127

CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying to run actually exists. (worker: raspberrypi)

Check the Path to the plugins directory.

sudo mkdir -p /usr/local/nagiossudo ln -s /usr/lib/nagios/plugins /usr/local/nagios/libexec




Questions?

Memory in MB

DNX Worker204

Mod-Gearman Worker3.29

CompiledNSCANSClient++SSHPerl

RAM0.0420.0460.30.5210.05

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level

Click to edit the title text format

2013

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level