porazdeljen nadzor omreŽjazabbix, the main focus of this thesis, is an open-source solution written...

72
UNIVERZA V MARIBORU FAKULTETA ZA ELEKTROTEHNIKO, RAČUNALNIŠTVO IN INFORMATIKO Ivana Kelemen PORAZDELJEN NADZOR OMREŽJA Z ORODJEM ZABBIX Magistrsko delo Maribor, september 2016

Upload: others

Post on 20-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

UNIVERZA V MARIBORU

FAKULTETA ZA ELEKTROTEHNIKO,

RAČUNALNIŠTVO IN INFORMATIKO

Ivana Kelemen

PORAZDELJEN NADZOR OMREŽJA

Z ORODJEM ZABBIX

Magistrsko delo

Maribor, september 2016

Page 2: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

i

PORAZDELJEN NADZOR OMREŽJA Z ORODJEM ZABBIX

Magistrsko delo

Študentka: Ivana Kelemen, bacc.ing.graph.teh.

Študijski program: študijski program 2. stopnje

Telekomunikacije

Smer: -

Mentor: izr. prof. dr. Andrej Žgank

Somentor: doc. dr. Janez Stergar

Lektorica: Marija Marčetić, mag.philol.angl.

Page 3: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

ii

Porazdeljen nadzor omrežja z orodjem Zabbix

Ključne besede: Zabbix, omrežje IP, nadzor omrežja

UDK: 004.774.6(043.2)

Povzetek

Zabbix je odprtokodni sistem za nadzor omrežja zasnovan na hierarhiji strežnik-agent. Z

uporabo spletnega dostopa omogoča administratorju nadzor stanja omrežja v realnem

času, hkrati pa omogoča tudi spremljanje informacij o preteklih spremembah stanja

posameznih metrik. Vsi zbrani podatki so shranjeni v skupno podatkovno bazo, kar

omogoča Zabbixu enostavno korelacijo med (na prvi pogled) nepovezanimi podatki.

Porazdeljen nadzor je lahko izveden z uporabo posredniških strežnikov. Visoka

zmogljivost spremljanja v realnem času pomeni, da je mogoče hkrati nadzirati več tisoč

strežnikov, virtualnih in omrežnih naprav.

Page 4: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

iii

Distributed network monitoring with Zabbix

Key words: Zabbix, IP network, network monitoring

UDK: 004.774.6(043.2)

Abstract

Zabbix is an open-source network monitoring system based on the agent-server hierarchy.

It allows the administrator to monitor network status in real time through the included web

interface. It also provides the possibility to analyze information about the historical

changes of individual metrics. All collected data is stored in one database, which allows

easy correlation between data which may seem unrelated at first glance. Distributed

monitoring can be performed by using proxy servers. High performance monitoring in real

time enables simultaneous control of thousands of servers, virtual and network devices.

Page 5: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

iv

Table of Contents

1. Introduction ........................................................................................................... 1

2. Distributed monitoring systems ............................................................................. 3

2.1. The importance of monitoring ................................................................................. 3

2.2. Zabbix ....................................................................................................................... 4

2.3. Other distributed monitoring systems .................................................................... 7

3. Zabbix installation................................................................................................ 10

3.1. Server installation .................................................................................................. 10

3.1.1. Requirements ................................................................................................. 11

3.1.2. Space usage .................................................................................................... 13

3.1.3. Time synchronization ..................................................................................... 14

3.1.4. The installation process .................................................................................. 14

3.2. Agent installation ................................................................................................... 17

4. Monitoring with Zabbix ........................................................................................ 20

4.1. Discovery ................................................................................................................ 21

4.1.1. Network discovery .......................................................................................... 21

4.1.2. Agent auto-registration .................................................................................. 23

4.1.3. Low-level auto discovery ................................................................................ 24

4.2. Web Monitoring ..................................................................................................... 26

4.3. Distributed monitoring .......................................................................................... 28

4.3.1. Why distributed? ............................................................................................ 28

4.3.2. Proxies ............................................................................................................ 28

4.3.3. Security ........................................................................................................... 29

4.4. High availability and failover .................................................................................. 32

4.4.1. Levels of IT service .......................................................................................... 32

4.5. Zabbix and High availability ................................................................................... 33

4.5.1. Zabbix server and web interface .................................................................... 34

4.5.2. Zabbix database .............................................................................................. 34

5. Handling data ...................................................................................................... 36

5.1. Data collection ....................................................................................................... 36

5.2. Data flow ................................................................................................................ 37

5.2.1. Zabbix items .................................................................................................... 37

Page 6: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

v

5.2.2. Zabbix trappers ............................................................................................... 38

5.3. Data visualization ................................................................................................... 39

5.3.1. Graphs ............................................................................................................. 40

5.3.2. Maps ............................................................................................................... 41

5.3.3. Screens and slideshows .................................................................................. 41

5.3.4. IT Services ....................................................................................................... 42

5.4. Incident management ............................................................................................ 42

5.4.1. Triggers ........................................................................................................... 43

5.4.2. Actions ............................................................................................................ 43

5.4.3. Trigger dependencies ..................................................................................... 44

5.5. Templates............................................................................................................... 45

5.5.1. Macros ............................................................................................................ 46

5.5.2. Linking templates to hosts.............................................................................. 47

5.5.3. Nesting templates........................................................................................... 47

5.5.4. Discovering hosts ............................................................................................ 48

5.6. Reports and capacity planning............................................................................... 49

5.6.1. Availability reports .......................................................................................... 49

5.6.2. Trigger frequency reports ............................................................................... 50

5.6.3. Capacity planning ........................................................................................... 50

6. Integration ........................................................................................................... 51

6.1. Third-party tools and applications ......................................................................... 51

6.1.1. External scripts and templates ....................................................................... 51

6.1.2. Android, iOS and desktop applications .......................................................... 52

6.2. API .......................................................................................................................... 53

6.3. Use-cases ............................................................................................................... 55

6.3.1. Pagerduty ........................................................................................................ 55

6.3.2. Ansible ............................................................................................................ 56

6.3.3. Issue tracking systems .................................................................................... 57

6.3.4. Importing Cacti data into Zabbix .................................................................... 58

6.4. Zabbix and the Internet of Things .......................................................................... 60

7. Conclusion ........................................................................................................... 61

References .................................................................................................................. 62

Page 7: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

vi

Table of figures Figure 4.1 An example of a network discovery rule ............................................................ 22

Figure 4.2 A number of found devices on the Zabbix dashboard ........................................ 23

Figure 4.3 Auto-registration action parameters.................................................................. 23

Figure 4.4 Mounted filesystem discovery rule ..................................................................... 24

Figure 4.5 Prototypes for filesystem discovery rule ............................................................. 25

Figure 4.6 Parameters of the Zabbix web scenario ............................................................. 27

Figure 4.7 Web scenario – response time graph ................................................................. 27

Figure 4.8 An example of an environment with a Zabbix proxy .......................................... 29

Figure 5.1 Agent decides on status of the measurement .................................................... 36

Figure 5.2 Agent takes the measurement and sends the data to the server ...................... 36

Figure 5.3 Zabbix item elements .......................................................................................... 37

Figure 5.4 A simple dashboard ............................................................................................ 39

Figure 5.5 Zabbix server CPU usage graph .......................................................................... 40

Figure 5.6 Custom graph – outgoing network ..................................................................... 40

Figure 5.7 A sample screen showing the CPU load and utilization ..................................... 41

List of tables

Table 2.1 A comparison of similar monitoring systems ......................................................... 8

Table 3.1 Hardware requirements according to environment size ..................................... 11

Table 3.2 Possible database engines ................................................................................... 12

Table 3.3 Zabbix software requirements ............................................................................. 12

Table 4.1 Possible events result of network discovery ........................................................ 21

Table 4.2 Group creation parameters ................................................................................. 31

Table 4.3 Common availability percentages with corresponding downtime intervals ...... 33

Table 5.1 Severity levels with corresponding definitions and suggested colors .................. 43

Table 5.2 Examples of predefined macros ........................................................................... 46

Page 8: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

vii

Table 5.3 Information fields in the “Status of Zabbix” report ............................................. 49

Table 6.1 Ticket severity and priority values ....................................................................... 58

List of abbreviations

API Application Programming Interface

CPU Central Processing Unit

DMZ Demilitarized Zone

DNS Domain Name System

FTP File Transfer Protocol

GPL General Public License

HTTP Hypertext Transfer Protocol

HTTPS Hypertext Transfer Protocol Secure

IP Internet Protocol

IPMI Intelligent Platform Management Interface

IPv4 Internet Protocol version 4

IPv6 Internet Protocol version 6

IT Information Technology

JMX Java Management Extensions

JSON JavaScript Object Notation

KVM Kernel-based Virtual Machine

LLD Low-level discovery

LTS Long Term Support

LXC Linux Containers

NRPE Nagios Remote Plugin Executor

NTP Network Time Protocol

ODBC Open Database Connectivity

PC Personal Computer

PHP Hypertext Preprocessor

POP3 Post Office Protocol version 3

RAM Random Access Memory

RDBMS Relational Database Management System

Page 9: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

viii

RPC Remote Procedure Call

RRDtool Round-robin database tool

SELinux Security-enhanced Linux

SLA Service Level Agreement

SMTP Simple Mail Transfer Protocol

SNMP Simple Network Management Protocol

SSH Secure Shell

SSL Secure Sockets Layer

TCP Transmission Control Protocol

TLS Transport Layer Security

URL Uniform Resource Locator

ZODB Zope Object Database

Page 10: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

1

1. Introduction

“Uptime is an illusion caused by lack of monitoring.”

– Unknown (DevOpsDays Ljubljana, April 2015)

The main scope of this thesis concerns the monitoring of a computer network. A computer

network can be defined as a system of independent computers interconnected with the

purpose of exchanging information and sharing peripherals and other network devices.

The exchange of information is the primary purpose of a computer network. There is a

practical need for networking within information systems, because it significantly increases

the usability and business value of information systems. New methods of business and a

modern way of work require a fast, reliable and uninterrupted flow of data, with the

possibility of access via different devices (PC, tablets, smartphones, etc.). Such

interconnected devices and systems demand high-performance, reliable and secure

networks. Security risks (such as intrusions and business interruptions) in modern

information and communication systems have existed since the very beginning and they

can only be tackled through the appropriate mechanisms of prevention and defense. One

of them is proper system and network monitoring, which lays the basis of a good

functioning system.

Monitoring systems can help the system stay up and running and raise the percentage of

the system’s availability. They are being deployed in networks of all sizes - mostly after

the system administrator's realization that the problems really do tend to happen when the

responsible person (or team) is least equipped to solve them. A proper monitoring system

can make sure the network is available and functional and can even prevent and predict

possible future incidents.

Zabbix is a software that can monitor not only various simple network parameters, but

also health and the integrity of servers and availability of services. Zabbix uses a flexible

notification mechanism that allows users to configure alerts for any event by using e-mail,

SMS or a voice call as a media channel. This provides a way to quickly react to server

problems and various network-related incidents. Reporting and data visualization features

based on the stored data are also included, which makes Zabbix very suitable for base

load capacity overview and planning.

Zabbix supports both polling and trapping as a way of collecting data from hosts. All

Zabbix reports and statistics, as well as configuration parameters, are accessed through a

Page 11: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

2

web-based frontend. This web-based interface ensures that the status of the network and

the health of monitored servers can be assessed from any location and any type of

device. When properly configured, Zabbix can play an important role in monitoring IT

infrastructure, regardless of size. This makes Zabbix very convenient for small

organizations with only a few servers and for large companies with a multitude of servers

[9].

Zabbix, the main focus of this thesis, is an open-source solution written by Alexei

Vladishev. The following pages will display the process of installation and configuration of

a centralized monitoring solution for web-server management, monitoring, alerting and

reporting. Also, since Zabbix offers many other features such as data visualization,

capacity planning and integration with the surrounding infrastructure, the appropriate

actions will be taken to demonstrate the power and full potential of this distributed

monitoring solution.

The goal of this thesis is to emphasize the need for the deployment of a suitable

monitoring system in environments of various sizes. Zabbix was chosen because of its

vast array of features, flexibility and the ability to adapt to various types of environments.

In the following pages, each chapter contains a short demonstration of a described

feature, from server installation and configuration, through distributed monitoring, proxies,

the different ways of data collection and visualization, templates etc., all the way to the

reporting, API and third-party tools.

The chapter “Distributed monitoring systems” aims to demonstrate the necessity of proper

system monitoring with an emphasis on the comparison between Zabbix and similar

monitoring solutions, with an overview of the features of the monitoring systems

mentioned. The chapter “Zabbix installation” describes the process of Zabbix server and

agent installation on various platforms, while “Monitoring with Zabbix” describes the

procedures of auto-discovery, low-level discovery, applying the principles of high

availability and a distributed system setup. How the data is handled, processed and

visualized is described in Chapter 5 – “Handling data”. It also includes the explanations of

Zabbix’s data flow, the common use of templates and useful reporting features. The

chapter “Integration” describes not only how Zabbix’s functionality can be extended

through the use of third-party applications and templates and a powerful API, but also

provides examples of possible use-cases through integration with automation tools

(Ansible) and alerting services (PagerDuty). The conclusion is given in Chapter 7.

Page 12: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

3

2. Distributed monitoring systems

System and network monitoring can be defined as a process of collecting and storing

state data about a certain system, where a system can either be a single computer, a

couple of network devices or a complex network composed of multiple interconnected

devices. Many products of this kind exist on the market and they can broadly be grouped

into two main categories: commercial and open-source software. Some of the open-

source solutions have become very popular within the communities - such as Nagios,

Zabbix and Zennos, among others.

2.1. The importance of monitoring

A properly set up monitoring system can play an important role in various IT-related

processes of a company. Users demand a high percentage of uptime (or service

availability) and service providers should do everything they can to keep the system

available most of the time. It is also important to make users aware of the alerting process

when an incident occurs. In the past, when a server suffered a hardware or software

malfunction which caused some service downtime, the user would inform the designated

person and report an error. This was vastly improved by the appearance of automated

alerts, which can, if properly set up, be much faster than the users and make the person in

charge aware of the problem before the user finds out, or preferably even prevent the

issue from happening by notifying the right person so they can take appropriate action.

This process also cuts down the response time of the engineer or administrator in charge

and reduces downtime periods.

The main premise of proper monitoring is simple and straight-forward: If a metric crosses

a predefined threshold, notify the person in charge. In order to avoid some of the alerts

being sent out in the middle of the night, a properly set up monitoring system periodically

checks the predefined set of parameters on a remote host. If the parameters are

configured correctly, the monitoring system will notify the person in charge when a metric

is nearing a “Problem” state with an alert (an email or a text message). If the person

ignores the warning and the metric goes into a “Critical” state, the monitoring system can

send a different kind of alert (usually a call generated through text-to-speech). Also,

recovery alerts can be sent out if the metric goes from a “Problem” state into an “OK”

state.

Distributed monitoring systems can use more than one server to distribute the load of

network monitoring. This setup is typical for large enterprise environments where one

Page 13: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

4

central monitoring system must keep track of multiple network segments or even different

geographical regions. Most distributed monitoring systems use some kind of proxy to

collect the raw data from different network segments, while the processing of the gathered

data is done on a central monitoring node.

2.2. Zabbix

Since it was first released to the public in 2001, Zabbix has been marketed as a powerful

and effective monitoring solution. Zabbix is an open-source package which is very easy to

obtain and deploy. It is distributed as a compact package with low hardware and software

requirements. Since it is relatively easy to use, it can be a very good contender even for

small environments with a tight budget. On the other hand, Zabbix's scalability and

distributed architecture really stands out in environments with a large number of monitored

objects, with a complex configuration and dependencies. [1].

Each new Zabbix software release is subject to a standard life cycle and expiry date. Life

cycles help make content for the new version release more predictable and manageable

for Zabbix users. The latest release of Zabbix, version 3.0.4, was released on 22nd July

2016. Last major release, version 3.0, is available since 16th February 2016 and it

includes full support until February, 2019. The end of Limited Support is scheduled for

February, 2021. In that sense, Full Support services include fixing general, critical and

security issues, while Limited Support services include fixing critical and security issues

only. Zabbix does not guarantee any code fixes for older releases and non-stable releases

[10].

The Zabbix team has committed to a program of scheduled releases on a six-month

basis, which means they release a new stable version every six months. In keeping with

that schedule, the next standard release (version 3.2) was announced for August 2016

and will include Full support until February, 2017 and Limited Support until March, 2017.

With regard to its features, Zabbix enables gathering various types of data from the

network. High performance real-time monitoring means that tens of thousands of servers,

virtual machines and other network devices can be monitored simultaneously. In addition

to storing the data, users can create various visualizations (such as overviews, maps,

graphs and screens) to better understand and correlate the data gathered from the

network. Zabbix also features a flexible way of analyzing the data for the purpose of

alerting. It can monitor all main protocols (HTTP, FTP, SSH, POP3, SMTP, SNMP, etc.),

Page 14: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

5

its native agent is available on all operating systems (Windows, OS X, Linux, FreeBSD,

etc.), it uses multi-step web application monitoring (checking content, latency and speed),

simplifies the host addition through various templates, it can monitor log files and reboots,

it uses auto-discovery (with automatically added resource monitoring), it can monitor a

host with or without the Zabbix agent, and perhaps most importantly, its feature list is

extendable trough API.

However, like all other solutions, Zabbix has its weaknesses and downsides, such as the

possibility of a fast database growth (if not tweaked correctly), a slow learning curve,

support from a smaller community and fewer community-written plugins (all of this

compared to Nagios).

Before diving into the specifics of Zabbix, an alphabetical list of common terms should be

understood [5,6]:

● action - a predefined means of reacting to an event. An action consists of

operations (e.g. sending a notification) and conditions (when the operation is

carried out) and operations (like sending a notification);

● application - a grouping of items in a logical group;

● escalation - a custom scenario for executing operations within an action; a

sequence of sending notifications or executing remote commands;

● event - a single occurrence of something that deserves attention, such as a trigger

changing state or a discovery/agent auto-registration taking place;

● frontend - the web interface provided with Zabbix;

● host - a networked device to be monitored, defined by an IP address or a DNS

record;

● host group - a logical grouping of hosts (it may contain hosts and templates).

Hosts and templates within a host group are not linked to each other in any way.

Host groups are used when assigning access rights to hosts for different user

groups;

● item - a particular piece of data a server receives from a host (commonly referred

to as a “metric”);

● media - a means of delivering notifications (delivery channel);

Page 15: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

6

● notification - a message about an event sent to a user via the chosen media

channel;

● remote command - a predefined command that is automatically executed on a

monitored host when a certain condition is met;

● template - a set of entities (items, triggers, graphs, screens, applications, low-level

discovery rules, web scenarios) ready to be applied to one or several hosts;

● trigger - a logical expression that defines a problem threshold and is used to

evaluate data received in items. When received data are above or below the

defined threshold, triggers go from “OK” into a “Problem” state or the other way

around;

● web scenario - one or several HTTP requests to check the availability of a web

site;

● Zabbix agent - a process deployed on monitoring targets to actively monitor local

resources and applications;

● Zabbix API - Zabbix API allows the usage of JSON RPC protocol to create, update

and fetch Zabbix objects (hosts, items, graphs and others) or perform any other

custom tasks;

● Zabbix proxy - a process that collects data on behalf of a Zabbix server, taking

some processing load off of the server;

● Zabbix server - a central process of Zabbix software that performs monitoring,

interacts with Zabbix proxies and agents, calculates triggers, sends notifications; a

central repository of data;

Everything on the network can be monitored: the performance and availability of servers,

web applications, databases, networking equipment and more. Zabbix can be scaled to

very large environments to employ distributed monitoring with the use of Zabbix proxies.

Proxies are useful when the environment extends over multiple sites or geographical

regions. In that case, each site can have its own “proxy“ (a local Zabbix monitor), taking

the load off the main Zabbix server and collecting data even if the connection to the main

server is interrupted or severed.

For improved user experience, Zabbix comes with a web-based interface, secure user

authentication and a flexible user permission schema. As mentioned earlier, polling and

trapping is supported, with native high performance agents gathering data from any

Page 16: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

7

popular operating system. Agentless monitoring methods are available as well, but with

limited options.

Web monitoring as well as monitoring virtual infrastructure (based on VMware, KVM, Xen,

Docker, LXC containers or any other similar technology) is also possible with Zabbix,

mostly through low-level discovery. This feature enables Zabbix to automatically discover

network servers and devices, all in terms of automatically assigning performance and

availability checks to newly discovered hosts.

2.3. Other distributed monitoring systems

Since there is a number of distributed monitoring systems available on the market (both

commercial and open-source), their feature overview makes a difference when choosing a

new or a replacement monitoring system for the environment. Most common features

found in similar monitoring systems are auto discovery, agentless monitoring, SNMP

monitoring, SLA reports, logical grouping, trending, trend prediction, syslog monitoring,

plugins, triggers/alerts, web application (with full or limited control), distributed monitoring,

inventory, maps and access control. Table 2.1 compares 6 of the most popular open-

source monitoring systems: Cacti, Icinga, Munin, Nagios, Zabbix and Zenoss.

Name Cacti Icinga Munin Nagios Zabbix Zenoss

IP SLA Reports Yes Via plugin No Via plugin Yes Yes

Logical

Grouping Yes Yes Yes Yes Yes Yes

Trending Yes Yes Yes Yes Yes Yes

Trend

Prediction Yes No Yes No Yes Yes

Auto

Discovery

Via

plugin Via plugin No Via plugin Yes Yes

Agentless Yes Supported No Supported Supported Supported

SNMP Yes Via plugin Yes Via plugin Yes Yes

Syslog Yes Via plugin No Via plugin Yes Yes

Plugins Yes Yes Yes Yes Yes Yes

Triggers /

Alerts Yes Yes Partial Yes Yes Yes

Page 17: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

8

Web interface Full

Control Full Control Viewing Yes

Full

Control

Full

Control

Distributed

Monitoring Yes Yes Via nodes Yes Yes Yes

Inventory Yes Via plugin Unknown Via plugin Yes Yes

Platform PHP C Perl C, PHP C, PHP Python,

Java

Data Storage

Method

RRDtool

, MySQL

MySQL,

PostgreSQL,

Oracle

Database

RRDtool Flat file,

SQL

Oracle,

MySQL,

PostgreSQ

L, IBM

DB2,

SQLite

ZODB,

MariaDB,

Apache

HBase

License GPL GPL GPL GPL GPL

Free Core

GPL,

Commerci

al

Enterprise

Maps Plugin Yes Unknown Yes Yes Yes

Access

Control Yes Yes Unknown Yes Yes Yes

IPv6 Yes Yes Yes Yes Yes Yes

Latest release

date 2016-05 2015-07 2014-11 2015-08 2016-07 2016-03

Latest release

version 0.8.8h 1.13.3 2.0.25 4.1.1 3.0.4 5.1.1

Table 2.1 A comparison of similar monitoring systems

All of the solutions mentioned above are open-source and encourage the usage of a Linux

distribution for the operating system. An important prerequisite for a proper setup is that

the system administrator (or any other person responsible for setting up the monitoring

system) has a decent level of experience in Linux system administration and a working

knowledge of the selected operating system.

Finally, when it comes to the implementation of a monitoring system, the choice usually

boils down to that between Zabbix and the de-facto industry standard – Nagios. They can

both cover the majority of the environments, but the feature overview (see Table 2.1) can

Page 18: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

9

tip the scale in favor of one of them. While Nagios has features like “Flapping” detection

(the condition when the state of the host/service changes frequently in a short time span)

and Automatic topography display, there are multiple disadvantages of choosing Nagios

over Zabbix. First of all, not only is Nagios lacking features like Auto Registration, Auto

Discovery, Aggregate Graphs, Distributed Monitoring, Windows Service Discovery and

Native JMX Support, but it also has a couple of distinct downsides. The first one is the

requirement of having SSH access or an add-on (NRPE – Nagios Remote Plugin

Executor) to monitor a remote system, and the second one is the fact that the web

application (web interface) is mostly read-only (on the Nagios web fronted, one can

acknowledge problems, disable alerts, and reschedule testing, but a new host or service

cannot be added).

With regard to adding a new host through Nagios’ configuration file, there is one more

remark about the Nagios/Zabbix dilemma to be made. Nagios is usually an appropriate

solution for environments where resources do not change frequently (because editing

configuration files on a daily basis can be a real hassle). And most importantly, there is

nothing in Nagios which cannot be done in Zabbix.

Page 19: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

10

3. Zabbix installation

Zabbix is an enterprise-class open-source distributed monitoring solution for servers,

network services and network devices. Zabbix is a server-agent type of monitoring

software, which means that the system consists of a Zabbix server, where all gathered

data is collected, and multiple Zabbix agents running on each host. All Zabbix data,

including configuration and data collected from hosts, is stored in a relational database

(MySQL, PostgreSQL or Oracle) on the server. The Zabbix server can run on all

UNIX/Linux distributions, while Zabbix agents are available for Linux, UNIX (AIX, HP-UX,

Mac OS X, Solaris, FreeBSD), Netware, Windows, and network devices running SNMP

v1, v2, and v3.

There is a reason for limiting the server component to Unix variants, as the Zabbix manual

[5] explains: "Due to security requirements and mission-critical nature of monitoring

server, Unix is the only operating system that can consistently deliver the necessary

performance, fault tolerance and resilience."

3.1. Server installation

According to the Zabbix manual [5], the Zabbix server is tested on the following platforms:

● Linux;

● IBM AIX;

● FreeBSD;

● NetBSD;

● OpenBSD;

● HP-UX;

● Mac OS X;

● Solaris.

The Zabbix server software consists of a server daemon (a process running in the

background) and a web interface. Both services are attached to a single database, but

they do not communicate with each other directly. The Zabbix daemon is the central

Zabbix process. It periodically polls clients running a Zabbix agent for updated statistics

and saves the information into a database. Beyond that, the Zabbix daemon also collects

SNMP data and performs housekeeping functions such as purging old data (according to

the predefined housekeeping intervals). The other core element of Zabbix is the PHP-

Page 20: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

11

based web interface. The web interface is used for configuration and administration, as

well as an easy access to data about monitored services and hosts.

3.1.1. Requirements

The Zabbix manual [5] offers several examples of hardware configurations based on the

size of the monitored network.

Name Platform CPU/Memory Database engine Number of

monitored hosts

Small CentOS Virtual

Appliance MySQL InnoDB 100

Medium CentOS 2 CPU cores

2GB RAM MySQL InnoDB 500

Large RedHat

Enterprise Linux

4 CPU cores

8GB RAM

RAID10 MySQL InnoDB

or PostgreSQL >1000

Very

large

RedHat

Enterprise Linux

8 CPU cores

16GB RAM

Fast RAID10 MySQL InnoDB

or PostgreSQL >10000

Table 3.1 Hardware requirements according to environment size

As shown in Table 3.1, for an environment which consists of less than a 100 hosts, the

Zabbix manual [5] recommends the installation of a Zabbix appliance. It can be used as

an alternative to setting up the server manually or reusing an existing server for Zabbix. A

Zabbix appliance installation CD can be used for the instant deployment of a Zabbix

server or Zabbix proxy with a preferred database engine (MySQL or PostgreSQL for

server and MySQL or SQLite3 for proxy). As the environment grows, it is common to use

more memory and more CPUs to match the number of monitored hosts.

In terms of software, Zabbix consists of an Apache web server, a database engine and

PHP scripting language. Prospective database engines are shown in the Table 3.2.

Software Version Comments

MySQL 5.0.3 or later Required if MySQL is used as Zabbix backend database.

InnoDB engine is required.

Oracle 10g or later Required if Oracle is used as Zabbix backend database.

PostgreSQL 8.1 or later Required if PostgreSQL is used as Zabbix backend database.

Page 21: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

12

SQLite 3.3.5 or later Required if SQLite is used as Zabbix backend database.

IBM DB2 9.7 or later Required if IBM DB2 is used as Zabbix backend database.

Table 3.2 Possible database engines [5]

The Zabbix manual [5] has two remarks about choosing the database engine: the support

for IBM DB2 is experimental and using SQLite3 with a Zabbix server is not recommended.

While SQLite3 can be used with Zabbix proxies without a problem, simultaneous

database access with a server and frontend may even lead to database corruption.

Table 3.3 shows the list of software required to run the PHP-based web frontend.

Software Version Comments

Apache 1.3.12 or later

PHP 5.4.0 or later PHP v7 is not supported yet.

PHP extensions:

gd 2.0 or later The PHP GD extension must support PNG images (--with-png-dir),

JPEG (--with-jpeg-dir) images and FreeType 2 (--with-freetype-dir).

bcmath php-bcmath (--enable-bcmath)

ctype php-ctype (--enable-ctype)

libXML 2.6.15 or later php-xml or php5-dom, if provided as a separate package by the

distributor.

xmlreader php-xmlreader, if provided as a separate package by the distributor.

xmlwriter php-xmlwriter, if provided as a separate package by the distributor.

session php-session, if provided as a separate package by the distributor.

sockets php-net-socket (--enable-sockets). Required for user script support.

mbstring php-mbstring (--enable-mbstring)

gettext php-gettext (--with-gettext). Required for translations to work.

ldap php-ldap. Required only if LDAP authentication is used in the frontend.

ibm_db2 Required if IBM DB2 is used as a Zabbix backend database.

mysqli Required if MySQL is used as a Zabbix backend database.

oci8 Required if Oracle is used as a Zabbix backend database.

pgsql Required if PostgreSQL is used as a Zabbix backend database.

sqlite3 Required if SQLite is used as a Zabbix backend database.

Table 3.3 Zabbix software requirements [5]

Page 22: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

13

When accessing the Zabbix web interface, cookies and JavaScript must be enabled on

the client side. The latest versions of Google Chrome, Mozilla Firefox, Microsoft Internet

Explorer and Opera are officially supported. According to the Zabbix documentation, other

browsers (Apple Safari, Konqueror) may work with Zabbix as well [5].

3.1.2. Space usage

Total disk space needed for the monitoring system depends on the size of the monitored

environment. While Zabbix configuration data require a fixed amount of disk space and do

not grow much over time, the size of the Zabbix database and the amount of stored

historical data mainly depends on the following variables:

● The number of processed values per second;

● Housekeeper settings for history;

● Housekeeper settings for trends;

● Housekeeper settings for events.

The number of processed values per second

This is the average number of new values the Zabbix server receives every second. For

example, if a network has 2000 items for monitoring with a refresh rate of 60 seconds, the

number of values per second is calculated at 2000/60 = 33.3, which means that 33 new

values are added to the Zabbix database every second.

Housekeeper settings for history

Zabbix keeps values for a fixed period of time (several weeks or months). Each new value

requires a certain amount of disk space for data and index.

If the system keeps 30 days of history and receives 33 values per second, the total

number of values will be around (30*24*3600)*33 = 85.536.000, which is around 85

million of values. Depending on the database engine and the type of received values, the

disk space required for keeping a single value may vary from 40 bytes to several

hundreds of bytes. If an average size of 90 bytes per value for numeric items is used for

this calculation, 85 million of values will require 85.536.000 * 90 bytes = 7.2GB of disk

space.

Housekeeper settings for trends

Zabbix keeps a 1-hour maximum/minimum/average set of values for each item. This data

is used for trend prediction and long period graphs. The one hour period is hard-coded. If

Page 23: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

14

the database requires about 90 bytes per each item and trend data is kept for 5 years,

2000 items will require 2000*24*365* 90 = 1.5GB per year or 7.5GB for 5 years.

Housekeeper settings for events

Each Zabbix event requires approximately 170 bytes of disk space [5]. Since it is difficult

to estimate the daily number of events generated by Zabbix, the worst case scenario

assumes that Zabbix generates one event per second, which leads to 3*365*24*3600*

170 = 15GB if the event data is kept for 3 years.

When all of the above is taken into consideration, the total required disk space can be

calculated as:

(3.1)

According to (3.1), the estimated total space required to run a Zabbix server with 2000

items for the next 5 years would be around 40GB. The disk space will not be used

immediately after installing Zabbix, rather, the amount of used disk space will increase

over time up to the calculated estimation of 40GB (if estimated correctly). Housekeeper

settings can have a valuable impact on the database growth trend and disk space usage

on the server over time.

3.1.3. Time synchronization

When deploying a high-performance distributed monitoring system, it is very important to

have a precise system time and date on a Zabbix server. Network time protocol daemon

(abbr. ntpd) is the most popular daemon that synchronizes the host's time with that of

other machines. It is strongly advised to maintain a synchronized system date on all

systems that Zabbix components are running on.

The NTP daemon is commonly installed on most managed servers and it actively checks

if the machine date corresponds to the NTP server's values. Some examples of local NTP

servers are zg1.ntp.carnet.hr and si.pool.ntp.org.

3.1.4. The installation process

A Zabbix server can be installed either from the source or from most distributions'

software management tools, which is usually decided by the administrator's preferences.

As always, both methods have their advantages and disadvantages. Installing from the

source provides an opportunity for heavy customization but it can take a lot of time and

effort, while installing from the package manager seems easier but lacks customization.

Also, when installing from the package manager, all the dependencies are sorted and

Page 24: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

15

solved, while during the installation from the source this has to be done manually. Since

the preferred operating system for this demonstration is CentOS 6 and due to easy

maintenance and a simple upgrades process, it is advised to install Zabbix from a

software manager, which in this case is “yum”.

For the purpose of this demonstration, Zabbix will be installed on a virtual machine

(created in VMware ESXi, a free hypervisor developed by VMware for deploying and

serving virtual machines). The resources of the virtual machine are 2GB of RAM and

12GB of disk space. The operating system is a minimal installation of a 64-bit CentOS 6.8

(downloaded from http://mirror.centos.plus.hr/centos/6.8/isos/x86_64/CentOS-6.8-x86_64-

minimal.iso).

First of all, the Zabbix server should have a dedicated static IP address and the operating

system should be properly setup and updated. To keep this demonstration as simple as

possible, iptables (a Linux firewall) and SELinux (Security-Enhanced Linux – a Linux

kernel security module) will be disabled.

The installation of Zabbix starts with enabling an additional software repository in order to

get the newest version available (for the selected operating system).

# rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm

After this, the zabbix packages can be installed (MySQL is used as a database engine):

# yum install zabbix-server-mysql zabbix-web-mysql

Since the packages are installed via “yum”, dependencies are automatically resolved and

appended to the installation transaction. The preferred database engine should be

installed separately (for example, with the command “yum install mysql-server”) and

started as a service. The command “mysql_secure_installation” should be executed to set

the password for the MySQL “root” user, remove anonymous users, disallow root login

remotely and drop the test database. After that, the new database named “zabbix” should

be created. Also, the Apache web server will be used to run a PHP-based Zabbix

frontend, so all required packages also must be installed:

# yum install httpd php php-cli php-common php-devel php-pear php-gd php-

mbstring php-mysql php-xml

The httpd service should then be started with

# /etc/init.d/httpd start

Page 25: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

16

In order to prepare the database for Zabbix, the initial schema and data must be imported

with the following commands (the compressed file create.sql.gz contains SQL queries

required to prepare the database schema and insert the required data):

# cd /usr/share/doc/zabbix-server-mysql-3.0.4 # zcat create.sql.gz | mysql -uroot zabbix

The Zabbix database user - zabbixusr should be granted all privileges on the zabbix

database, which can be done using a simple SQL command:

mysql> grant all on zabbix.* to 'zabbixusr'@'127.0.0.1' identified by

'zabbixusersuperstrongpassword';

User-specific data must be entered into /etc/zabbix/zabbix_server.conf so that Zabbix is

able to connect to the database:

# vi /etc/zabbix/zabbix_server.conf DBHost=localhost DBName=zabbix DBUser=zabbixusr DBPassword=zabbixusersuperstrongpassword

In the next step, the zabbix-server service is started:

# /etc/init.d/zabbix-server start

The Apache configuration file for the Zabbix frontend is located in

/etc/httpd/conf.d/zabbix.conf. As suggested by the Zabbix manual, the following PHP

settings are used to enhance and improve the performance of Zabbix:

php_value max_execution_time 300

php_value memory_limit 128M

php_value post_max_size 16M

php_value upload_max_filesize 2M

php_value max_input_time 300

php_value date.timezone Europe/Zagreb

The setting max_execution_time sets the maximum time in seconds that a script is

allowed to run before it is terminated by the parser. Memory_limit sets the maximum

amount of memory in bytes a script is allowed to allocate, which helps prevent poorly

written scripts from taking up all available memory on a server. Post_max_size

determines the maximum size of post data allowed, which also affects file upload.

Max_input_time sets the maximum time in seconds a script is allowed to parse input data

(like POST and GET). Timing begins when PHP is invoked at the server and ends when

Page 26: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

17

execution begins. Date.timezone sets the default time zone used by all date and time

functions (which is important when processing data and sending alerts).

After the modifications were made, Apache should be restarted with the following

command:

# /etc/init.d/httpd restart

This makes the Zabbix server ready to use and the next step is to install the agent on a

remote host.

3.2. Agent installation

The Zabbix agent is deployed on a remote host to actively monitor local resources and

applications (hard drives, memory, processor statistics, etc.). The agent gathers

operational information locally and reports data to the Zabbix server for further processing.

In case of failures (such as a hard disk running full or a crashed service process), the

Zabbix server can actively alert the administrators of a particular machine. Zabbix agents

use native system calls for gathering statistical information about the system parameters,

which is one of the reasons for their efficiency [5].

Zabbix agents can perform two types of checks: passive and active. In a passive check,

the agent responds to a data request. The Zabbix server (or proxy) asks for a piece of

data, for example, the CPU load, and the Zabbix agent sends back the result. Active

checks, on the other hand, require more complex processing on the agent side. First, the

agent must retrieve a list of items from the Zabbix server to process independently and

then it periodically sends new values to the server. The list of authoritative servers is set in

the ”ServerActive” parameter of the agent’s configuration file, and the

”RefreshActiveCheck” parameter sets the frequency for asking for new checks. If the

remote host tries to refresh a list of active checks and fails, the request is retried after the

hardcoded 60 seconds.

The selection between passive or active checks is determined by selecting the respective

monitoring item type. The Zabbix agent can process items of the type “Zabbix agent”,

which stands for passive checks and “Zabbix agent (active)” for active checks.

The Zabbix agent on UNIX/Linux is designed to run as a non-root user. If it is run as the

user “root”, it will switch to a hardcoded “zabbix” user, which must be present on the

monitored system. By default, the Zabbix agent starts six processes and uses around

Page 27: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

18

3MB of RAM and 0.1% of processor power. Network traffic for communicating with the

server is minimal and can be measured in bytes.

Agent installation on most common operating systems is simple, as shown below.

Windows:

An archive with the latest (or preferred) version of the Zabbix agent is downloaded from

http://www.zabbix.com/downloads/3.0.0/zabbix_agents_3.0.0.win.zip and extracted to the

preferred location. Assuming the configuration file is C:\zabbix\zabbix_agentd.conf, the

following must be executed in the command line:

# c:\zabbix\bin\win64> zabbix_agentd.exe -c c:\zabbix\zabbi x_agentd.conf --

install

It is also possible to install multiple instances of the Zabbix agent by executing the

following lines in the command line:

# zabbix_agentd.exe --config <configuration_file_for_instance_1> --install --

multiple-agents

# zabbix_agentd.exe --config <configuration_file_for_instance_2> --install --

multiple-agents

...

# zabbix_agentd.exe --config <configuration_file_for_instance_X> --install --

multiple-agents

The agent can be started and stopped from Control Panel or from the command line with

the commands:

# zabbix_agentd.exe --start

# zabbix_agentd.exe --stop

CentOS:

When installing the Zabbix agent on a CentOS machine, the Zabbix repository should first

be added with the following command:

# rpm -Uvh http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm

After that, the agent can be installed:

# yum install zabbix-agent

The configuration file /etc/zabbix/zabbix_agentd.conf should contain the IP of the Zabbix

server and the desired hostname of the monitored host (for easier distinction between

Page 28: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

19

hosts in the Zabbix web frontend). For the changes in the configuration file to take effect,

the agent must be restarted:

# /etc/init.d/zabbix-agent restart

Ubuntu:

The installation on Ubuntu and other Debian-like systems is similar to the procedure

explained in the CentOS example. First, the Ubuntu 14.04 repository is added:

# wget http://repo.zabbix.com/zabbix/3.0/ubuntu/pool/main/z/zabbix-release/\

zabbix-release_3.0-1+trusty_all.deb

# sudo dpkg -i zabbix-release_3.0-1+trusty_all.deb

# sudo apt-get update

After this, the agent is installed with

# apt-get install zabbix-agent

and after modifications in the configuration file (Server and Hostname) the agent is

restarted with

# /etc/init.d/zabbix-agent restart

All further host setup (host properties, items, actions and triggers) is done through the web

interface.

Page 29: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

20

4. Monitoring with Zabbix

As there are many different systems with varying application types, there is no easy

answer to the question of what to monitor. First of all, the metrics that should be monitored

greatly depend on the type of device (whether it is a server, network switch, printer or

some other network device). That fact usually affects the decision whether to install an

agent or to go with agentless monitoring. If a native Zabbix agent is installed, over 70

predefined types of metric data can be monitored (graphed, calculated and visualized).

Some items supported on the Linux operating system include agent.hostname,

agent.ping, agent.version, kernel.maxfiles, kernel.maxproc, log, logrt, net.dns,

net.dns.record, net.if.in, net.if.out, net.if.total, net.tcp.port, net.udp.listen, proc.num,

system.hw.cpu, system.uptime and so on. Windows operating systems have their

corresponding items, such as eventlog, net.if.list, perf_counter, proc_info, service.info,

services and similar. Checks for agentless monitoring include TCP checks, SNMP checks,

IPMI, JMX, VMware, SSH and telnet.

Disregarding the agent/agentless dilemma, items (or metrics) can be divided into the

categories of system statistics, applications and databases. System stats include items

like system.cpu.load, system.cpu.util, system.uptime, system.boottime, system.cpu.intr

and the like. Distinct applications can be monitored through an exact set of predefined

items. For example, the application “MySQL Server” may contain all items related to the

MySQL server: the availability of MySQL, disk space, processor load, the number of

transactions per second, the number of slow queries and much more. Databases are

commonly monitored through the ODBC technology and their drivers. This way, a Zabbix

server can collect any data in RDBMS (Relational Database Management System)

databases such as MySQL, PostgreSQL, Oracle and Microsoft SQL Server. This feature

makes monitoring more effective because it collects information directly from the data in

the database, avoiding false positives.

Zabbix encourages the use of a low-level discovery feature, which provides a way to

automatically create items, triggers, and graphs for different entities on a remote host.

This way, Zabbix can automatically start monitoring file systems or network interfaces

without the need to create items for each file system or network interface manually.

Page 30: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

21

4.1. Discovery

Zabbix's discovery facilities are based on a set of rules that periodically scan the network,

and then react according to predefined conditions.

4.1.1. Network discovery

Zabbix has a flexible automatic network discovery functionality. With network discovery

properly set up, the administrator can speed up Zabbix deployment, simplify

administration and use Zabbix in rapidly changing environments without excessive

administration. Zabbix network discovery is based on the following information [5]:

● IP ranges;

● Availability of external services (SSH, HTTP, POP3, IMAP, TCP, etc.);

● Information received from Zabbix agent (only unencrypted mode is supported);

● Information received from SNMP agent.

Network discovery consists of two phases: discovery and actions. This feature does not

enable the discovery of network topology.

Zabbix periodically scans the IP ranges previously defined in network discovery rules. The

frequency of the check can be configured for each rule individually. One discovery rule will

always be processed by a single discoverer process, one IP range will not be split

between multiple discoverer processes.

Each rule has a set of service checks defined to be performed for the desired IP range.

Every service or host check generates a discovery event. A list of possible events is listed

in Table 4.1.

Event Check of service result

Service Discovered The service is up after it was down or when discovered for the first

time.

Service Up The service is up, consecutively.

Service Lost The service is down after it was up.

Service Down The service is down, consecutively.

Host Discovered At least one service of a host is up after all services of that host

were down.

Host Up At least one service of a host is up, consecutively.

Host Lost All services of a host are down after at least one was up.

Host Down All services of a host are down, consecutively.

Table 4.1 Possible events result of network discovery

Page 31: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

22

One or more actions can be performed based on the discovery event. These actions

include sending notifications, adding or removing hosts, enabling or disabling hosts,

adding hosts to a group, removing hosts from a group, linking hosts to template, unlinking

from template, executing remote scripts and so on.

For example, if the Add host action is selected and a new host is discovered, its hostname

is the result of reverse lookup or IP address if reverse lookup fails. Newly created hosts

are added to the Discovered hosts group by default. This level of automation speeds up

the process of setting up the monitoring for newly added or installed hosts.

Figure 4.1 shows the example of a network discovery rule used to find all devices in

149.5.187.0/24 network, with the automatically assigned “ICMP ping” check for every

host.

Figure 4.1 An example of a network discovery rule

The criterion for uniqueness is the IP address (the other possible option is “Type of

discovery check”, which can be either SNMP or Zabbix agent check). When the discovery

rule is successfully performed, the Zabbix dashboard shows the number of found devices

(see Figure 4.2).

Page 32: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

23

Figure 4.2 A number of found devices on the Zabbix dashboard

4.1.2. Agent auto-registration

With Active agent auto-registration, it is possible to allow the server to automatically start

monitoring newly discovered hosts. This way, new hosts are monitored without configuring

them manually on the server. Auto-registration happens when a previously unknown

active agent asks for checks. When adding a new server to the monitored environment, it

is only necessary to install a Zabbix agent (active) and point it to a Zabbix server.

Active agent auto-registration also supports the monitoring of added hosts with passive

checks. When the active agent asks for checks, if the “ListenIP” or “ListenPort”

configuration parameters are defined in the configuration file, they are also sent to the

server. The server, when adding the new auto-registered host, uses the received IP

address and port to configure the agent. If no IP address value is received, the one used

for the incoming connection is used. If port value is left undefined, port 10050 is used.

When a Zabbix agent is installed on a server, its configuration should contain the IP

address of the Zabbix server in the ServerActive directive (ServerActive=149.5.187.199 in

this case). In the Configuration tab of the Zabbix dashboard, Auto Registration is chosen

as the Event Source and then the Action is created. Figure 4.3 shows the parameters of

the action, such as the name and the default message sent after the host is registered.

The operation defined is Add host, which adds the registered host to the inventory.

Figure 4.3 Auto-registration action parameters

Page 33: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

24

The feature comes very handy for the automatic monitoring of new virtual nodes. As soon

as the Zabbix agent is installed on a new node, Zabbix will automatically start the

collection of the host’s performance and availability data.

4.1.3. Low-level auto discovery

This feature has been a standard part of Zabbix since version 2.0. It provides a way to

automatically create items, triggers, and graphs for different entities on a monitored host.

In that manner, Zabbix can automatically start monitoring file systems or network

interfaces on the remote machine, without having to create items for each file system or

network interface manually. It is also possible to remove unneeded items automatically

based on actual results of periodically performed discovery.

Six types of discovery items are supported by default [5]:

● discovery of file systems (since Zabbix agent version 2.0);

● discovery of network interfaces (since Zabbix agent version 2.0);

● discovery of CPUs and CPU cores (since Zabbix agent version 2.4);

● discovery of SNMP OIDs (since Zabbix server and proxy version 2.0);

● discovery using ODBC SQL queries (since Zabbix server and proxy version 3.0);

● discovery of Windows services (since Zabbix server version 3.0).

The process is as follows: first a discovery rule must be created in “Configuration” →

“Templates” → “Discovery” column, as shown in Figure 4.4.

Figure 4.4 Mounted filesystem discovery rule

Page 34: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

25

A discovery rule consists of an item that discovers the necessary entities (for instance, file

systems or network interfaces) and prototypes of items, triggers, and graphs that should

be created based on the value of that item. The discovery rule contains general discovery

rule attributes: Name, Type, Key, Update interval (in seconds), Custom intervals, Keep

lost resources period (in days), Description and Enabled parameter. A list of prototypes

attached to the filesystem discovery rule is shown in Figure 4.5.

Figure 4.5 Prototypes for filesystem discovery rule

For extended functionality and customization, there is also the possibility of creating a

custom low-level discovery rule. This enables the discovery of any type of entities - such

as databases on a database server. This custom item should only return a JSON

document, specifying found objects and, optionally, some of their properties.

When a rule is created, an item prototype should be created too. The use of macros is

encouraged, so that in this example, a macro {#FSNAME} stands for file system name.

When the discovery rule is processed, this macro will be substituted with the name of the

discovered file system.

After the item prototype is done, the trigger should also get a prototype. It is possible to

define dependencies between trigger prototypes as well (since Zabbix 3.0). A trigger

prototype may depend on another trigger prototype from the same low-level discovery

rule, or on a regular trigger. A trigger prototype cannot depend on a trigger prototype from

a different LLD rule or on a trigger created from trigger prototype. A host trigger prototype

cannot depend on a trigger from a template [5].

At the end, if the graph prototype is created too, the whole scenario for a newly

discovered host is performed. The discovery of network interfaces is conducted in exactly

the same way as the discovery of file systems (described above), except that the

discovery rule key “net.if.discovery” is used instead of “vfs.fs.discovery” and macro

{#IFNAME} instead of {#FSNAME} in filter and item/trigger/graph prototypes. In a similar

manner, it is also possible to perform the discovery of CPU cores, ODBC (Open Database

Connectivity) SQL queries and SNMP OIDs (Simple Network Management Protocol

Object Identifiers).

Page 35: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

26

4.2. Web Monitoring

Zabbix also has a built in way to check several availability aspects of web sites. The

feature is called “Web monitoring”, it supports multiple steps and is based on cURL, an

open-source command line tool and library for transferring data with a URL syntax [7, 14].

In order to use web monitoring it is necessary to define web scenarios, which consist of

one of multiple HTTP requests (or steps). The steps are periodically executed by the

Zabbix server in a predefined order. Web scenarios are attached to hosts or templates in

the same way as items or triggers, which means that the web scenarios can also be

created on a template level and then applied to multiple hosts in one move.

The following information is collected in any web scenario [5]:

● average download speed per second for all steps of whole scenario;

● number of the step that failed;

● last error message.

Every step of a web scenario contains information about the following [5]:

● download speed per second;

● response time;

● response code.

Zabbix can also check for a predefined string in the content of the retrieved page, execute

a simulated login and even follow a path of simulated mouse clicks on the page.

Zabbix web monitoring supports both HTTP and HTTPS. When running a web scenario,

Zabbix will optionally follow redirects (if the option Follow redirects is active). The

maximum number of redirects is hard-coded to 10. All cookies are preserved during the

execution of a single scenario.

As with all the gathered data, the data collected from executing web scenarios is kept in

the database and the data is automatically used for graphs, triggers and notifications.

As an example of a web scenario, the Zabbix web interface will be tested. A new web

scenario is created for a host, with the following parameters as shown in Figure 4.6.

Page 36: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

27

Figure 4.6 Parameters of the Zabbix web scenario

As shown in Figure 4.6, the variables user and password are used to gain access to the

Zabbix web interface. The Steps tab contains only one step, which tests the response

time of the defined URL. Based on that information, Zabbix generates a response time

graph, like the one shown in Figure 4.7.

Figure 4.7 Web scenario – response time graph

Page 37: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

28

4.3. Distributed monitoring

Zabbix provides distributed monitoring in real time with centralized web administration. It

allows monitoring the health of any host on the network from a single point of entry.

Performance monitors include everything from host memory, processor, and swap space

usage to free disk on all mounted partitions, running processes, disk read/write

operations, and much more. There is also the possibility of writing custom checks, so

basically every metric on the system can be monitored. However, all of this becomes

slightly more complicated if the infrastructure includes multiple geographical locations,

numerous demilitarized zones (DMZs) – that is where the “distributed” adjective really

comes into play [3].

4.3.1. Why distributed?

The limits of a single-server configuration are mostly visible when monitoring thousands of

hosts in a scalable infrastructure with a complex network topology or the necessity to

manage different geographical locations with slow or faulty connections.

Many DMZs and network segments with a strict security policy don't allow two-way

communication between any hosts on either side, so it can be impossible for a Zabbix

server to communicate with all the agents on the other side of a firewall. Different

branches of the same company or different companies in the same group may need some

sort of independence in managing their respective networks, while also needing some

coordination and higher-level aggregation of monitored data. Thanks to its distributed

monitoring features, Zabbix can provide adequate solutions whether the problem is about

performance, administrative independence or data retention [1].

4.3.2. Proxies

Zabbix provides an effective and reliable way of monitoring a distributed IT infrastructure

using Zabbix proxies. Proxies can be used to collect data locally on behalf of a centralized

Zabbix server and then report the data to the server. They are particularly useful when

multiple geographical sites need to be monitored; in that case each site can have its own

"proxy" (local Zabbix monitor). The Zabbix proxy takes a part of the load off the main

Zabbix server and collects the data.

Zabbix proxies are lightweight, they work independently, are easy to maintain, support

automatic database creation (only in SQLite [5]), are ready for embedded hardware,

support one-way TCP connections and allow centralized configuration. On the other hand,

Page 38: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

29

they do not have a graphical or web interface, do not allow the possibility of local

administration and cannot generate notifications or alerts.

The most common use-cases for proxy implementation include monitoring remote

locations or locations with unreliable communications; offloading the Zabbix server when

monitoring thousands of devices and simplifying the maintenance of distributed

monitoring. A proxy can be useful when monitoring multiple cloud-based instances, like

Amazon’s AWS. Since a proxy requires only one TCP connection to the Zabbix server, it

is easier to get around a firewall. An example of a possible firewall setup is shown in

Figure 4.8.

Figure 4.8 An example of an environment with a Zabbix proxy

All data collected by the proxy is stored locally before being transmitted over to the server.

This way no data is lost due to temporary communication problems with the server. One

thing should be pointed out: the Zabbix proxy is only a data collector, it does not calculate

triggers, process events or send alerts.

4.3.3. Security

The matter of security is very important when multiple locations are being monitored. In

that scenario, all monitoring data goes through the insecure channels of the Internet, so

encryption is an inevitable necessity. The security aspect can also be vital in user access

– in case of some larger organization, there could be a requirement that not all Zabbix

users have access to all monitored hosts or hostgroups.

Encryption

Zabbix supports encrypted communication between the Zabbix server, Zabbix proxy,

Zabbix agent, zabbix_sender and zabbix_get utilities using Transport Layer Security (TLS)

Page 39: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

30

protocol v.1.2. Encryption is supported since version 3.0. Zabbix supports certificate-

based and pre-shared key-based encryption.

Encryption is optional and configurable for individual components. In other words, some

proxies and agents can be configured to use certificate-based encryption with the server,

some others can use pre-shared key-based encryption and the rest of the agents can

continue with unencrypted communications. A server or a proxy can use different

encryption configurations for different hosts.

Zabbix daemon programs use a single listening port for encrypted and unencrypted

incoming connections. Adding encryption does not require opening new ports on firewalls.

User access

All users access the Zabbix application through the web-based frontend. Each user is

assigned a unique username and a password. The user passwords are encrypted and

stored in the Zabbix database. Communication between the web server and the user

browser can be protected using SSL. Since security is also important in terms of user

administration, Zabbix has implemented the following user permission schemes [5]:

Zabbix User: The user has access to the Monitoring menu, but has no access to

resources by default. Any permission to host groups must be explicitly assigned.

Zabbix Admin: The user has access to the Monitoring and Configuration menus, but has

no access to any host groups by default. Any permission to host groups must be explicitly

given.

Zabbix Super Admin: The user has access to everything: Monitoring, Configuration and

Administration menus. The user has a read-write access to all host groups. Permissions

cannot be revoked by denying access to specific host groups.

It is important to note that the access to any host data in Zabbix is granted to user groups

on the host group level only, which means that an individual user cannot be directly

granted access to a host (or host group). The user can only be granted access to a host

by being part of a user group that is granted access to the host group to which that

particular host belongs [5].

When creating a new user, the dialog asks for the following data: Alias (as a unique

username), Name (the user’s first name), Surname (the user’s surname), Password (two

fields for entering the user password), Groups (a list of all user groups the user belongs

to), Language (the preferred language of the Zabbix frontend), Theme (defines the look of

the frontend, with 3 possible themes to choose from - System default, Blue and Dark),

Page 40: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

31

Auto-login (if enabled, Zabbix will remember the user’s credentials and keep the user

logged-in for 30 days), Auto-logout (enables automatic user logout after the set number of

seconds of inactivity with a minimum of 90 seconds), Refresh (sets the refresh rate used

for graphs, screens, plain text data; can be set to 0 to disable), Rows per page

(determines how many rows per page will be displayed in lists) and URL (after login)

(Zabbix can send the user to a specific URL after a successful login) [5].

The Media tab contains a listing of all media defined for the user. Here the term “Media”

denotes the channels for sending notifications. The Permissions tab contains information

about the user type and the host groups/hosts the user has access to.

It may often make sense to separate the information available to one group of users to

that available to another one. This can be accomplished by grouping users and then

assigning varied permissions to host groups. When a host group is created, parameters

shown in Table 4.2 are used:

Parameter Description

Group name Unique group name.

Users Contains a listing of the members of this group.

Frontend access

How the users of the group are authenticated.

System default - use default authentication

Internal - use Zabbix authentication. Ignored if HTTP authentication is set.

Disabled - access to Zabbix is forbidden

Enabled

Status of user group and group members.

Checked - user group and users are enabled

Unchecked - user group and users are disabled

Debug mode Mark this checkbox to activate debug mode for the users.

Table 4.2 Group creation parameters

Further, Composing permissions and Calculated permissions can be defined as Read-

write, Read or Deny.

To enhance security and activity logging, Zabbix provides configuration auditing for all

types of configuration changes (the changes made to the Zabbix configuration and the

changes made to hosts and devices). When a user changes something, it is immediately

logged in the Audit log, which can be useful when it is required to track all new activities

that occurred since the last login. Auditing information includes the ID of the user that

logged into the Zabbix administration console, the resource that was modified, the action

Page 41: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

32

that was made, and some additional details regarding the specific event, such as what

new host was added or how host A is linked to host B [3].

4.4. High availability and failover

Per its definition, high availability is a characteristic of a system which aims to ensure a

higher-than-average level of performance. High availability, in combination with a

distributed setup, is a best practice scenario for large and complex environments with high

demands.

In high availability engineering, single points of failure should be eliminated through

redundancy, the crossover between components or devices must be reliable and failures

should be detected as they appear.

The terms of scheduled and unscheduled downtime are well known in the IT industry.

Scheduled downtime is typically a type of maintenance which cannot be avoided and it

interrupts the system’s normal operations. Usually, it is initiated by management and it is

inevitable within the current system’s design limitations. On the other hand, unscheduled

downtimes are normally a consequence of some physical event, such as various

hardware failures, faulty network connections, power outages, different operating system

failures or application errors. But in terms of high availability - downtime is a downtime

whether it is scheduled or not.

4.4.1. Levels of IT service

Uptime is not synonymous with availability. A system can be up and running but not

available; for instance, in case of a network fault, the service will not be available, but all

the systems will be up and running. Availability must be calculated end-to-end, and all the

components required to run the service must be available.

Availability is calculated as a percentage of uptime in a year. Table 4.3 shows the interval

of allowed downtime for particular availability percentage according to the Service Level

Agreements. Downtime and service unavailability usually have an impact on monthly

invoices, because it is not uncommon for the service provider to offer a service discount if

the uptime percentage drops below the value agreed in the SLA. Table 4.3 shows the

correlation between availability percentages and the allowed system unavailability

intervals.

Page 42: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

33

Availability percentage Downtime per year Downtime per month Downtime per day

90% (“one nine”) 36.5 days 72 hours 2.4 hours

99% (“two nines”) 3.65 days 7.20 hours 14.4 minutes

99.9% (“three nines”) 8.76 hours 43.8 minutes 1.44 minutes

99.99% (“four nines”) 52.56 minutes 4.38 minutes 8.66 seconds

99.999% (“five nines”) 5.26 minutes 25.9 seconds 864.3 milliseconds

99.9999% (“six nines”) 31.5 seconds 2.59 seconds 86.4 milliseconds

99.99999% (“seven nines”) 3.15 seconds 262.97 milliseconds 8.64 milliseconds

99.999999% (“eight nines”) 315.569 milliseconds 26.297 milliseconds 0.864 milliseconds

99.9999999% (“nine nines”) 31.5569 milliseconds 2.6297 milliseconds 0.0864 milliseconds

Table 4.3 Common availability percentages with corresponding downtime intervals

As an example, a managed IT hosting service might offer an SLA including 99.9% uptime

with a service discount of 10% if uptime is lower than 99.9% and a discount of 30% if

uptime is lower than 99%. The SLA should also contain the defined response times (such

as mean time to response and mean time to repair) and cases excluded from the uptime

calculations (such as failures in the electricity supply network, excess humidity and high

temperature or application errors).

4.5. Zabbix and High availability

If the whole monitored infrastructure is fully redundant and built with high availability

principles in mind, the same should be applied to the central monitoring system. In the

worst case scenario, if some hardware or software issue occurs on the monitoring system

itself, the whole team or company could lose access to very important monitoring and

trending data (or worse, if some unrelated failure appears on the network while the

monitoring system is down, there is a possibility that no notifications will be sent out). With

that in mind, it is recommended to ensure a 3-tier redundancy for a Zabbix server - the

web interface, the Zabbix server and the database.

Page 43: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

34

4.5.1. Zabbix server and web interface

The web application (or the Zabbix frontend) does not produce or generate data or any

kind of file on the web server. One way of achieving high availability is to have two nodes

deployed on two different servers, implementing a highly available fault-tolerant disaster-

recovery setup. The only other component that is needed is a resource manager that will

detect the failure of the primary node and will coordinate the failover on the secondary

node. An example of a resource manager is Pacemaker/Corosync, which is a software

layer that provides the messaging service between servers within the same cluster.

Corosync allows any number of servers to be a part of the cluster using different fault

tolerant configurations such as Active-Active, Active-Passive, and N+1. Corosync, in the

middle of its tasks, checks that Pacemaker is running and practically bootstraps all the

process that is needed [1].

When configuring a highly available setup for Apache or any other service, it is useful to

understand a technique called “STONITH” (abbreviation for “Shoot The Other Node In

The Head”). The split-brain scenarios happen when each node believes that the other is

broken and it is the first node. Fencing is the isolation of a failed node so that it does not

cause disruption to a computer cluster. As its name suggests, STONITH fences failed

nodes by resetting or powering down the failed node. A critical host error in a cluster can

have catastrophic results, like both nodes trying to write to a shared storage resource.

STONITH provides effective protection against these problems. One of the effective ways

to prevent STONITH is to deploy a cluster of 3 or more members, so that in a case when

one of the members suffers an error and becomes unreliable or unresponsive, the rest of

the cluster can have a quorum and decide to cut out the failed node.

In the case when the Zabbix server is separated from the web interface and the database,

high availability is accomplished in the same way as with the web interface - with

Pacemaker/Corosync.

4.5.2. Zabbix database

To implement this solution, two database servers are needed with two installations of the

same software and operating system. Since this setup involves two different servers, the

data needs to be replicated between them. This implies that the servers need to be

interconnected with a dedicated network connection that is capable of providing the

needed throughput.

Page 44: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

35

One way of implementing an HA setup of the database servers is to use a PostgreSQL

database cluster on top of a DRBD (Distributed Replicated Block Device) based on LVM

(Logical Volume Manager) volumes.

DRBD is based on logical block devices (conventionally named /dev/drbdX, where X is the

device number) stretched over multiple local block devices on a selected number of

cluster nodes. The writes to the primary node are passed-on to the lower-level block

device and simultaneously propagated to the secondary node. Afterwards, the secondary

node transfers the data to its lower-level block device. In DBRD, all reading operations are

performed on local devices. DBRD technology is a convenient companion to the Heartbeat

protocol and/or Pacemaker/Corosync cluster manager in achieving database high

availability.

Page 45: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

36

5. Handling data

In a central monitoring solution such as Zabbix all the data collected from the agents is

stored on a central node, where it is handled and processed for alerting purposes,

trending analysis and visualization.

5.1. Data collection

In a Zabbix configuration, the metrics defined on a host are commonly referred to as

“items”. This chapter explains how to monitor items as metrics, explains the data flow and

demystifies trapper items as the means to control the data flow. Most commonly, the

monitoring applications have the data flow as shown in the diagram shown in Figure 5.1.

Figure 5.1 Agent decides on status of the measurement

In this scenario, an agent is asked to not only take a measurement, but also incorporate

some kind of status decision about the said measurement before sending it to the main

server's component for further processing.

The Zabbix agent or monitoring probe is tasked with just the measurement part, and then

it sends the said measurement to the server component for storage and eventually for

further processing. This data flow is visualized with Figure 5.2.

Figure 5.2 Agent takes the measurement and sends the data to the server

The data is not associated to a specific trigger decision (OK/Warning/Critical or any other

variation), but is kept on the server as a single data point or measurement. Where

applicable (for numeric types) it is also kept in an aggregate and trending format as

minimum, maximum and average over different periods of time. Keeping data separated

from the decision logic, but all in a single place, gives Zabbix two distinct advantages.

The first one is that Zabbix can be used to gather data on things that are not directly

related to the possible alerts and actions, but to the overall performance and behavior of a

system. The second big advantage is having a full, central database of raw data which

Data measurement

Trigger logic

Status sent to server

Event logging and actions

Data measurement

Status sent to server

Trigger logic

Event logging and actions

Page 46: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

37

can be used to leverage any trigger or decision logic. This way it is possible to exactly

define the kind of event that the user wants to monitor and be alerted on. Also, triggers do

not rely on a single measurement – it is possible to correlate anything with anything else

in the item history database.

5.2. Data flow

A Zabbix item consists out of 3 elements: an identifier, data type and an associated host,

as shown in Figure 5.3.

Figure 5.3 Zabbix item elements

The identifier (name and the associated item key) and the associated host are used to

distinguish a single item among the others. Data type is important so that Zabbix knows

how to store the data, how to visualize it, and most importantly, what kind of functions can

be applied to it in order to model triggers and further processing.

5.2.1. Zabbix items

A standard Zabbix item is considered passive from the agent's point of view. This means

that it is the server's job to ask the agent, at the time intervals defined for the item, to get

the desired measurement and report it back immediately. In terms of network operations,

a single connection is initiated and brought down by the server, while the agent is in the

listening mode.

In the case of a Zabbix active item, it is the agent's job to ask the server what monitoring

data it should gather and at what intervals. It then proceeds to schedule its own

measurements, and connects back to the server to send them over for further processing.

In terms of network operations, two separate sessions are involved in the process:

● The agent asks the server about items and monitoring intervals;

● The agent sends the monitoring data it collected to the server.

Apart from the network connection initiation, the main difference between a passive and

an active item is that with an active item it is impossible to define flexible monitoring

Zabbix item

Data type Associated host Identifier

Page 47: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

38

intervals. With a passive item, different monitoring intervals can be defined based on the

time of the day and the day of the week.

An active or passive item is just one of the many other items. A host can be defined by a

mix of active and passive items, so it is not possible to assume that an agent will always

initiate all of its connections to the server. In order to do that, all of the items that rely on

the agent have to be defined as active, including the future ones.

5.2.2. Zabbix trappers

Trapper items are, in some aspects, the opposite of Zabbix's external checks from the

point of view of data flow. An external check item type is used when it is necessary to ask

the server to execute an external script to gather measurements, instead of asking an

agent (Zabbix, SNMP, or others). As the number of external scripts grows, it can

significantly slow down the server. The slow-down can get to the point of accumulating a

great number of overdue checks, while the server is busy executing external scripts.

There is a simple way to avoid that, and it includes converting all external check items to

trapper items, scheduling the execution of the same scripts used in the external checks,

and modifying the scripts themselves so that they use zabbix_sender to communicate the

measured data to the server.

Page 48: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

39

5.3. Data visualization

The Zabbix web interface is shipped with the default package. It offers both the

configuration of monitoring and viewing gathered data. A network administrator can see

and control everything in one interface. The interface offers centralized configuration of all

monitoring aspects. The configuration changes become active immediately, with no

restarts required.

Global search provides suggestions, based on the entered string. This string is matched

against hosts only and all suggestions are case-insensitive.

Global notifications display information that may require immediate user attention,

regardless of the screen the user is looking at. Global notifications involve both showing a

message and playing a sound. Global notifications can be enabled per user in profile

configuration. If enabled, global message timeout can be changed. By default, messages

will stay on screen for 90 seconds. It is possible to receive messages for problems and for

resolutions. Messages can be filtered based on trigger severity as well. A custom sound

can be assigned to every trigger severity level and recovery message. As the messages

arrive, they are displayed in a floating section on the right hand side. This section can be

repositioned vertically.

Zabbix Dashboard is a central place in the web frontend and provides high-level

personalized details about the monitored environment. Most useful information is available

on one screen, such as the status of the Zabbix server, system status, host status, last 20

issues, web monitoring and discovery status [13]. An example of a dashboard is shown in

Figure 5.4.

Figure 5.4 A simple dashboard

Page 49: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

40

5.3.1. Graphs

Zabbix already has data gathered and stored, so it is easy to generate graphs from them.

A standard graph for a numeric item is available without any configuration, they are

generated on runtime. An example of a simple graph is shown in Figure 5.5.

Figure 5.5 Zabbix server CPU usage graph

Custom graphs are more powerful than simple standard graphs. A custom graph can

compare the data of several items and the user can specify the graph style or the way

lines are displayed. Custom graphs can be created for a host, multiple hosts or for a

single template. An example of a custom graph is shown in Figure 5.6, which shows

parameters of a graph for outgoing traffic statistics on a host.

Figure 5.6 Custom graph – outgoing network

Zabbix enables the user to create instant ad-hoc graphs for several items. Since neither

simple graph nor custom graph offer the possibility to quickly create a comparison graph

for multiple items, ad-hoc graphs can be created for several items in a very quick and

easy way.

Page 50: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

41

5.3.2. Maps

Zabbix network maps offer a possibility of visualizing the environment for a user-friendly

overview. Elements on the map may represent a host, host group, single trigger, an image

or another map. Maps can be created by any system user. They can be either public

(available to all users) or private (belonging to the owner and shared users/users groups).

In the last few versions of Zabbix, map editing has been improved by adding modern

features like drag-and-drop support and whole area selection, Also, element detail display

can be shown in a popup window.

Since map elements can show information related to triggers, a map can display the

changes of state on the monitored hosts. Additionally, icon and link descriptions can

contain user-defined macros. For example, a link description can contain real-time

bandwidth and the host description can show the average load of the processor.

By clicking on a host, a user can access various scripts and links to the trigger status

page, which will be filtered to provide the list of currently active triggers for the host, and

the host screen page.

5.3.3. Screens and slideshows

Zabbix screens can show independent visual elements (graphs, maps, data overviews,

etc.) which can be grouped together for display in a single overview screen. In its

essence, a Zabbix screen is a table which can, in any cell, contain a graph, a user-defined

graph, a map, another screen, plain text information, server information overview, trigger

information overview, data overview, a clock, history of events, history of actions or an

URL (a link to data taken from another location). A sample screen shown in Figure 5.7

shows 2 graphs of Zabbix server – CPU load and CPU utilization.

Figure 5.7 A sample screen showing the CPU load and utilization

Page 51: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

42

A slide show may rotate several screens one after another, according to the previously

configured intervals. Screens and slide shows can be created by all system users.

Screens and slide shows can be public and available to all users or private belonging to

the owner and shared users/users groups.

5.3.4. IT Services

Zabbix also has a separate system called “IT Services” to categorize monitored data. An

IT service represents a hierarchical view of the service. In an “IT service”, several different

monitored systems can be aggregated into a single “service”. This allows uptime to be

calculated for not just on individual components, but on the entire service, which is

consistent with the premise of availability and uptime percentage (as explained in Chapter

4.4.1)

For example, if a website is being monitored, it is necessary to identify all the service

components (load balancer, application server, database server, etc) and identify their

corresponding triggers.

In another example, email service depends on one server running DNS, another running

SMTP and POP, along with disk space on an NFS server. Using IT Services, all these

monitored services can be gathered into one “email” service, which is then reported as

down when any of its component services are down. It also allows quick insight into which

underlying component is causing an email service failure [4].

5.4. Incident management

While items are used to gather all the data, the system administrator cannot keep track of

that data waiting for a condition that deserves attention. Data evaluation is performed by

trigger expressions.

Triggers are user-definable expressions that evaluate the received data to true, false, or

unknown. Actions are defined upon a trigger and can they occur whenever a state change

of a trigger is registered. A common action is to send a notification to an individual or a

group when an incident occurs.

Another important thing here are the trigger dependencies. In a simple example, if a

server is located behind a router and the router goes down, Zabbix will send the

notification for both devices. This is where dependencies between services on separate

hosts are particularly useful. With dependency properly set-up, the notifications of the

dependents could be suppressed and only the notification for the root problem will be sent

to the appropriate recipient, avoiding unnecessary multiple alerts.

Page 52: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

43

5.4.1. Triggers

Triggers consist of 3 important elements: trigger expression, trigger dependency and a

severity level. An expression is used for calculating the trigger state, dependencies make

sure that only the relevant alerts are sent out and the severity defines how important a

trigger is.

Severity is not just a simple label attached to a trigger. The web frontend will display

different severity values with different colors and it is possible to create different actions

based on them, but they have no further meaning or function in the system.

Table 5.1 shows the list of available severity levels, along with their descriptions and a

suggested color [5]:

Severity Definition Color

Not classified Unknown severity. Grey

Information For information purposes. Light blue

Warning Be warned. Yellow

Average Average problem. Orange

High Something important has happened. Light red

Disaster Disaster. Financial losses, etc. Red

Table 5.1 Severity levels with corresponding definitions and suggested colors

The severities are used for:

● the visual representation of triggers (different colors for different severities);

● audio in global alarms (different audio for different severities);

● user media (different notification channels for different severities, for example -

SMS for high severity and email for the others);

● limiting actions by conditions against trigger severities.

All trigger severity names and their corresponding colors are customizable.

5.4.2. Actions

Actions can be defined in response to events of all supported types [5]:

● Trigger events - when trigger status changes from OK to PROBLEM and back;

● Discovery events - when network discovery takes place;

● Auto registration events - when new active agents auto-register;

● Internal events - when items become unsupported or triggers go into an unknown

state.

Page 53: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

44

Actions are completely independent from hosts and templates. They are globally defined

and their conditions are checked against every individual Zabbix event. Every action is

composed of the following four different elements: action definition, action conditions,

action operations and action escalations.

Action definition defines a name for the action, but can also define a default message that

can be sent as a part of the action itself. That message can contain data about the event

(such as the host, item and trigger name and item and trigger values). The usage of

macros is highly encouraged to make the message more useful and expressive.

Action conditions are based on the event's hosts, trigger, and trigger values. Just like

trigger expressions, an action condition can combine different simple conditions with a

series of logical operators. Different conditions are available for every type (trigger events,

discovery events, auto registration events and other events).

Action operations can be a way of sending a message or executing a remote command.

For discovery and auto-registration events, there are the additional operations available:

add host, remove host, enable host, disable host, add to group, delete from group, link to

template, unlink from template and set host inventory mode.

Action escalations are implemented to inform the other users about new problems

immediately. With properly configured escalations, notifications can be repeated until the

problem is resolved, sending a notification can be delayed and the notifications can be

escalated to another “higher” user group. Also, remote commands can be executed

immediately or when a problem is not resolved for a lengthy period of time and in the end,

recovery messages can be sent to a corresponding person. Actions are escalated

according to escalation steps, where each step has duration in time. It is also possible to

define a custom duration of an individual step, but the minimum duration of one escalation

step is 60 seconds [5].

5.4.3. Trigger dependencies

Zabbix does not support dependencies between hosts directly, but they can be defined

through trigger dependencies. A trigger may depend on one or more triggers. Before

changing the status of the trigger, Zabbix will check for corresponding trigger

dependencies. If dependencies are found, and if one of those triggers is in a “Problem”

state, then the trigger status will not be changed, actions will not be executed and

notifications will not be sent.

Page 54: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

45

It is important to note that events and actions for dependent triggers will not be

suppressed if the trigger they depend on is disabled, has disabled item or disabled item

host.

5.5. Templates

Templates represent the core approach to monitoring in Zabbix. They improve the

process of adding a new host so that it is simply linked to a previously created template.

After that, the new host automatically inherits all of the template's entities (such as items,

triggers and actions).

Templates are predefined sets of entities that can be quickly applied to multiple hosts and

they are a perfect complement to the discovery feature. By using templates, new hosts

can be added automatically without having to manually create them. The advantage of

low-level discovery of the Zabbix agent also plays a part in this, in a way that it is possible

to automatically assign the correct items for items like the number of disks, file systems

and network interfaces [1].

Templates are often used to group entities for particular services or applications (like

Apache, MySQL, PostgreSQL or Postfix) and then applied to hosts running those

services. Entities may include various items, triggers, graphs, applications, screens, low-

level discovery rules and web scenarios.

When installed, Zabbix comes with 38 pre-made templates and some of them are shown

in Figure 5.8. They can be accessed, modified and used in the Templates tab.

Figure 5.8 Available templates in a fresh installation of Zabbix

As seen in Figure 5.8, many templates come out of the box and there are a lot of user-

generated templates for a variety of applications. A large percentage of templates is

Page 55: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

46

based on some sort of a script (PHP/Perl/Python) that polls the application and sends the

data back to the server.

Host templates are very similar to regular hosts. Both need a unique name, can belong to

one or more groups and are collections of items, triggers, graphs, screens, and low-level

discovery rules. The crucial difference is that a host can be contacted through one or

more means (one or more IP addresses or a DNS record) so that the Zabbix server can

actually take measurements on it. A template, on the other hand, does not have an

access interface, so the Zabbix server will never try to check if a template is alive or ask it

for the item measurements.

5.5.1. Macros

Macros are very useful to make a message general enough that it can be applied to a

wide range of events. It will be the Zabbix server's job to substitute all the macros in a

message with the actual content based on the specific event it is handling. Some of the

predefined macros include:

Name: Translates to: Notes:

{HOST.CONN} Hostname or IP address of

the host

Will be identical to either {HOST.IP} or

{HOST.DNS} depending on the Connect to

option in the host's configuration form.

{HOST.DNS} The host's hostname

This must correspond to the host's fully

qualified domain name as defined in the

domain's DNS server.

{HOST.HOST} The host's name as

defined in Zabbix

The main host identifier. It must be unique for

the specific Zabbix server. If using

an agent, the same name must be present in

the agent's configuration on the host.

{HOST.IP} The host's IP address

A host can have more than one IP address.

They can be reference by using {HOST.IP1},

{HOST.IP2}, and so on up to {HOST.IP9}.

Table 5.2 Examples of predefined macros

A special class of macros is comprised of user-defined, template-level and host-level

macros. The user can configure them in the Macros tab of every host or template creation

and administration form. Macros provide a translation facility from a custom label to some

predefined value.

User-defined macros can be used everywhere built-in macros can be used. When used in

a template, they prove useful in defining common thresholds for triggers. The usefulness

Page 56: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

47

is even greater when used in connection with nested templates (as described in Chapter

5.5.3).

Macros are often used in combination with external scripts. As the external scripts don't

share any information with the rest of Zabbix other than the arguments they are passed

and their return value, it is often essential to include the host's IP address or hostname as

one of the arguments. This ensures that the script will connect to the right host and collect

the right data. A single, well-configured script can perform the same operation on many

different hosts thanks to the template systems and macros like {HOST.CONN} or

{HOST.IP}.

5.5.2. Linking templates to hosts

Once linked, a host will inherit all of the template's entities. Previously existing entities with

the same name will be overwritten, but entities not included in the template will remain as

they were before the linking operation. Unlinking a template from a host does not

eliminate its entities, unless it is unlinked and cleared. Clearing also deletes all of the

item's history and trends.

There is one important thing to have in mind - if an entity is modified (item, trigger or a

graph) from a template's configuration tab, the modifications will be applied immediately to

all linked hosts. On the other hand, if a template entity is edited from a particular host's

configuration tab, the changes will only apply to that host, and not on the template level.

While this can be useful to address any special circumstances for an otherwise regular

host, it can also generate some confusion if many local changes are performed. If used

often, it will surely make it more difficult to keep track of the changes over the time [1].

5.5.3. Nesting templates

Nesting is a way of hierarchically connect a template to one or more other templates. The

benefit of nesting is that when a host is linked to one template, the monitored host will

inherit all entities of the linked templates automatically.

The first application of nested templates is to make user macros even more general.

Since a template inherits all of its linked templates entities and properties, any custom

macro will also be inherited and thus made available to the actual monitored hosts.

A real world scenario can be demonstrated on an example of a web server with, for

example, Apache, MySQL, and PHP. The solution is to create 3 templates – each for a

distinct service (in this case Apache, MySQL, and PHP), then create another template (for

Page 57: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

48

example – Webserver) and link the Webserver template to the 3 templates created earlier.

When a new host is linked to the Webserver template, it automatically inherits items and

properties of the mentioned 3 templates. If necessary, all of them can be individually

linked to any number of new hosts.

5.5.4. Discovering hosts

Another way of linking templates to hosts is to let the Zabbix server do it automatically by

combining Zabbix's host-discovery facility with discovery actions. Zabbix's discovery

facilities consist of a set of rules that periodically scan the network. New hosts are added

according to predetermined conditions. The methods in Zabbix that can be used to check

for new or disappeared hosts, given an IP range, are:

● The availability of a Zabbix agent;

● The availability of a SNMP agent;

● Response to simple external checks (FTP, SSH, etc.);

● A combination of the above.

Upon successful discovery, a template can be linked to the newly discovered host. This

high-level of automation is very useful in rapidly changing environments that still display a

good level of predictability depending on the kind of hosts, such as dynamic virtual

environments. In these kind of environments new hosts can appear on a daily basis, and

old hosts can disappear at any time, but the hosts are fairly similar to each other.

However, a suggested best practice would be to limit discovery actions to sending

messages about discovered and disappeared hosts so that a team responsible for

monitoring can be more or less up-to-date about the hosts and take appropriate actions

when a new host appears or an old one disappears.

Page 58: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

49

5.6. Reports and capacity planning

Users have access to a variety of predefined and user-customizable reports focused on

displaying an overview of parameters such as the status of Zabbix, triggers and gathered

data. A simple example of a report can be Status of Zabbix, which shows the following

information [5]:

Zabbix server is

running

Status of the Zabbix server:

Yes - server is running

No - server is not running

Location and port of

Zabbix server.

Number of hosts Total number of hosts configured is displayed.

Templates are counted as a type of host too.

Number of monitored

hosts/not monitored

hosts/templates.

Number of items Total number of items is displayed. Only items

assigned to enabled hosts are counted.

Number of

monitored/disabled/un

supported items.

Number of triggers

Total number of triggers is displayed. Only triggers

assigned to enabled hosts and depending on

enabled items are counted.

Enabled/disabled

triggers. (Triggers in

Problem/OK state.)

Number of users Total number of users configured is displayed. Number of users

online.

Required server

performance, new

values per second

The current number of new values expected from

Zabbix server per second is displayed.

Table 5.3 Information fields in the “Status of Zabbix” report

5.6.1. Availability reports

In an Availability report a user can see what proportion of time each trigger has been in

“Problem” or “OK” state. The percentage of time for each state is displayed. It is also

possible to sort triggers by hosts or by triggers belonging to a template.

Page 59: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

50

5.6.2. Trigger frequency reports

The Trigger top 100 report shows triggers that have changed their state most often within

the desired period of evaluation, sorted by the number of status changes. Both host and

trigger column entries are links that offer some useful options:

● for the host - links to user-defined scripts, latest data, inventory, graphs and

screens for the host;

● for the trigger - links to latest events, the trigger configuration form and a simple

graph.

5.6.3. Capacity planning

With data collected by Zabbix it is easy to analyze, for example, the growth of the disk

usage, and know precisely when the available space will be filled. That way, it is possible

to prevent the occurrence of critical incidents, such as a power overhead, the overuse of

an Internet link or the exhaustion of storage space.

Zabbix can, when properly configured, detect the waste of CPU, memory, disk or network

bandwidth, on a single device or an entire group of servers. With proper planning, a

responsible person can reallocate the applications and equipment to use the available

resources wisely.

Page 60: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

51

6. Integration

Zabbix offers integration with any number of third party software instances via API

(Application Programming Interface). All Zabbix mobile clients are based on the API, even

its native web interface heavily depends on it. The API plays a significant role when it

comes to the integration of Zabbix with third-party software like configuration and incident

management systems, as well as the automation of routine tasks.

In environments manipulated through configuration management systems such as Salt,

Puppet, Chef or others, Zabbix integration can reduce the time of adding, removing or

upgrading hardware or software, even with hundreds or thousands of new hosts.

6.1. Third-party tools and applications

Third party tools extend the functionalities of the Zabbix server and give users the

possibility to use external scripts and multiple applications with diverse sets of features.

This includes a number of different templates, scripts and a large number of applications

developed to make monitoring with Zabbix even easier for the end-users.

6.1.1. External scripts and templates

Most popular templates include ones for most common SQL and NoSQL databases (such

as MySQL, PostgreSQL, MongoDB, Redis and so on), queue managers, web servers and

various network devices. Some are mentioned on the following pages.

MySQL template contains over 80 items, but the following macros must be defined on

host level (which will inherit objects from the attached template): {$MYSQL_PWD} as

MySQL password, {$MYSQL_USER} as MySQL user and {$DATABASE_NAME} for

getting size of the database.

PostgreSQL monitoring template for Zabbix (pg_monz) is a template for monitoring

PostgreSQL. It offers various types of monitoring such as alive, resource, performance

and many more. With pg_monz , it is also possible to monitor streaming replication and

load balancing for pgpool-II, a PostgreSQL middleware .

Another PostgreSQL Monitoring for Zabbix - libzbxpgsql provides detailed and

granular monitoring of PostgreSQL servers using a native Zabbix agent module, with

highly configurable item keys and a complimentary monitoring template.

Page 61: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

52

Grafana-Zabbix allows connecting Zabbix to the Grafana metric dashboard (grafana.org).

Rabbitmq-zabbix contains a template and checks to monitor rabbitmq queues and server

via Zabbix.

Nginx template collects summary values, log parse and graphs through trapper.

ESXi SNMP template can be used for the discovery of datastores and virtual machines,

but it can also tracks host memory usage.

6.1.2. Android, iOS and desktop applications

The Zabbix community developed a range of different applications and add-ons to help

users react quickly to incidents and resolve problems. A typical feature list includes basics

such as displaying a list of active triggers and hosts, item overview and graphs. Some

examples include [16]:

Andzabbix is a native Android client for the Zabbix monitoring system. It provides a

simple-to-use client, with many features like displaying a list of active triggers and hosts,

with item overview and all graphs. Andzabbix can check for new active triggers in the

background and notify about it. Additionally, it supports multiple servers, SSL connections

and basic authentication.

Chromix is a Chrome extension for checking Zabbix trigger events by using a Zabbix API.

When a Zabbix trigger status changes to "PROBLEM", this extension generates a desktop

notification. It is also possible to manage multiple Zabbix servers.

HyClops is a Zabbix extension that allows the automatic monitoring of services used by a

customer at AWS (Amazon Web Services) & VMware vSphere. The extension uses APIs

from AWS, vSphere, and Zabbix to create seamless experience and data exchange. All it

requires is for the user to provide their AWS or vSphere ESXi account management

information and HyClops registers all hosts (for each instance) and VM information

automatically.

Mobbix is a mobile Zabbix client interface for Android. This smartphone application allows

Zabbix users to easily and conveniently manage incidents on their nodes monitored by the

Zabbix server.

MobileOp is an iOS Zabbix application client that brings some basic functions of the

Zabbix front-end interface plus other improvements useful in mobile scenarios. This app is

intended for an IT Manager and/or Administrator/Operator of a Zabbix system setup to

simplify operations such as accessing information, tracking alarms, modifying

configuration and viewing all data managed by a Zabbix monitored ICT Infrastructure.

Page 62: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

53

MobileOp extends Zabbix functionality in mobile environments by enabling users to

receive Zabbix alert/trigger notifications as iOS push notification message.

Zabbifier is a simple Mac OS X client for Zabbix monitoring system. It can check multiple

Zabbix servers and show all active triggers, it also notifies about problems using Growl

and the status bar icon.

6.2. API

The Zabbix API allows the retrieval and modifications of the Zabbix configuration and

provides access to historical data. It was introduced in version 1.8 and is widely used to

create new applications to work with Zabbix, integrate Zabbix with third party software and

to automate various routine tasks.

The Zabbix API is a part of the web frontend and it uses the JSON-RPC 2.0 protocol. The

API consists of a number of methods grouped into separate APIs. Each method performs

one specific task, for example, the host.create method is used to create new hosts.

When the web interface is up, remote HTTP requests can be directed to the API. That is

done by sending HTTP POST requests to the api_jsonrpc.php file located in the frontend

directory. For example, if the Zabbix frontend is installed under http://some-

domain.com/zabbix, the HTTP request to call the apiinfo.version method may look

somewhat like this:

POST http://fqdn.com/zabbix/api_jsonrpc.php HTTP/1.1

Content-Type: application/json-rpc

{"jsonrpc":"2.0","method":"apiinfo.version","id":1,"auth":null,"params":{}}

The request must have the Content-Type header set to one of these values:

application/json-rpc, application/json or application/jsonrequest.

Before a user can access any data inside of Zabbix, the API requires a user to log in and

obtain an authentication token. This can be done using the user.login method. If a user

attempts to log in as a standard Zabbix Admin user, a sample JSON request will look like

this:

{

"jsonrpc": "2.0",

"method": "user.login",

"params": {

Page 63: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

54

"user": "Admin",

"password": "zabbix"

},

"id": 1,

"auth": null

}

The request has the following properties:

● jsonrpc - the version of the JSON-RPC protocol used by the API,

● method - the method being called,

● params - parameters that will be passed to the method,

● id - an arbitrary identifier of the request,

● auth - a user authentication token.

If the provided credentials are correct, the response returned by the API will contain the

user authentication token:

{

"jsonrpc": "2.0",

"result": "0424bd59b807674191e7d77572075f33",

"id": 1

}

After the successful authentication token retrieval, a user can access the data in Zabbix.

Here is an example of a request which can be used to retrieve the IDs, host names and

interfaces of all configured hosts:

{

"jsonrpc": "2.0",

"method": "host.get",

"params": {

"output": [

"hostid",

"host"

],

"selectInterfaces": [

"interfaceid",

"ip"

]

},

"id": 2,

"auth": "0424bd59b807674191e7d77572075f33"

}

Page 64: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

55

And the response can contain something similar to:

{

"jsonrpc": "2.0",

"result": [

{

"hostid": "10084",

"host": "Zabbix server",

"interfaces": [

{

"interfaceid": "1",

"ip": "127.0.0.1"

}

]

}

],

"id": 2

}

The API can greatly improve and extend the functionality of Zabbix, and the possibilities to

API usage are endless.

6.3. Use-cases

Possible use-cases for API integration include connections to external systems, other

monitoring tools and services. Examples of integration with other tools are presented on

the following pages, with an emphasis on integration via the Zabbix API.

6.3.1. Pagerduty

PagerDuty is the enterprise incident resolution service that integrates with various

monitoring solutions to improve operational reliability and agility. From enriching and

aggregating events to correlating them into incidents, PagerDuty streamlines the incident

management process by reducing alert noise and resolution times.

PagerDuty extends Zabbix’s functionality by providing on-call scheduling, alerts and

incidents tracking through the PagerDuty API. PagerDuty notifies about the most critical

Zabbix events in order to make immediate action [11].

When integrating Zabbix with PagerDuty, first the PagerDuty agent must be installed on

the Zabbix server. The agent is a helper program that can be installed on a monitoring

system to integrate monitoring tools with PagerDuty. On the web interface of PagerDuty

the new Service should be created, after which the Integration Name is entered in the

Page 65: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

56

Integration Type menu. In Incident Settings, the Escalation Policy, Notification Urgency,

and Incident Behavior are specified for the new service. On the Zabbix side, PagerDuty is

created as a new Media type. PagerDuty user group is created to set the permissions for

access to a list of hosts that will be monitored with PagerDuty. When the PagerDuty user

is created, the corresponding media type should be assigned. In order to make the

PagerDuty integrated with Zabbix, PagerDuty media type should contain the following

parameters: {ALERT.SENDTO}, {ALERT.SUBJECT} and {ALERT.MESSAGE}.

6.3.2. Ansible

Ansible is a simple IT automation engine that automates cloud provisioning, configuration

management, application deployment and many other IT needs. It works via SSH (Secure

Shell), which means that it uses no agents and no additional custom security

infrastructure. This makes it easy to deploy - and most importantly, it uses a very simple

language (YAML, in the form of Ansible Playbooks) to describe the automation jobs [12].

Ansible contains numerous modules and one of them is zabbix_host, which provides a

way to create, modify and delete Zabbix host entries and associated group and template

data.

Playbooks are Ansible’s configuration, deployment, and orchestration language. They can

describe a policy to be applied to a number of hosts or set of steps in a general IT

process.

To incorporate the addition of a host to Zabbix in an existing playbook, the following

example can be used:

- name: Create a new host or update an existing host's info

local_action:

module: zabbix_host

server_url: http://some-domain.com

login_user: Admin

login_password: zabbix

host_name: Host1

host_groups:

- Group1

- Group2

link_templates:

- Template1

- Template2

status: enabled

state: present

inventory_mode: automatic

interfaces:

Page 66: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

57

- type: 1

main: 1

useip: 1

ip: 10.xx.xx.xx

dns: ""

port: 10050

- type: 4

main: 1

useip: 1

ip: 10.xx.xx.xx

dns: ""

port: 12345

proxy: a.zabbix.proxy

Ansible can also be used for setting hosts in maintenance mode before a big update and

removing the maintenance window afterwards, all by using the module

zabbix_maintenance.

6.3.3. Issue tracking systems

Issue tracking systems are commonly used in large, enterprise-sized, environments. They

manage and maintain lists of issues, as needed by an organization. They are often used

in an organization's customer support call center to create, update, and resolve reported

customer issues.

One open-source example of an issue-tracking system is Request Tracker – a web

application written in Perl with a web server in front of a relational database. RT features a

simple REST API, which can be used in order to create and keep track of existing tickets

from Zabbix. On the other hand, a powerful scripting engine that can execute custom

scripts, not only allows RT to automate its internal workings and create custom workflows

but also allows it to communicate with external systems using any available protocol.

The two basic elements of RT are tickets and queues. The function of a ticket is to keep

track of the evolution of an issue. The basic lifecycle of a ticket can be summarized in the

following four points:

● A ticket is created with the first description of the problem,

● An operator takes the ownership of the ticket and starts working on it,

● The evolution of the problem is recorded in the ticket's history,

● After the problem's resolution, the ticket is closed and archived [1].

Page 67: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

58

To integrate Zabbix with Request Tracker it is recommended to create a custom queue for

Zabbix. When customizing the tickets, trigger severity labels are commonly translated to

ticket priority values. A suggestion of value is shown in Table 6.1.

Trigger severity label Trigger severity value Ticket priority value

Not classified 0 0

Information 1 20

Warning 2 40

Average 3 60

High 4 80

Disaster 5 100

Table 6.1 Ticket severity and priority values

Custom fields can be configured when customizing the tickets, in order to incorporate the

ability to search through tickets based on any custom parameter.

6.3.4. Importing Cacti data into Zabbix

Cacti is a network graphing solution based on RRDTool's data storage and graphing

functionality. RRDtool is the open-source industry standard, high performance data

logging and graphing system for time series data. Cacti provides a fast data poller, graph

templates and multiple data acquisition methods.

Cacti can maintain a large database of aggregated graphs containing data collected over

years or even decades. When implementing a new monitoring system, it could prove quite

useful to import some of the data (or in the best case scenario - all of the data) from Cacti

into Zabbix. The procedure is not so straightforward and simple, but it can be done.

The process assumes the following components are present:

● An RRD file that contains the legacy data to be imported - Cacti in this case;

● RRDTool;

● Zabbix version 1.8 installed along with zabbix_sender;

● a custom conversion script.

First, RRD files must be converted to a parsable format. RRDTool, the tool used by Cacti

for interacting with RRD data files, provides the dump option to convert to RRD files to

XML files. It can be run with the following command:

Page 68: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

59

# rrdtool dump example.rrd > example.xml

where example.rrd is the name of the particular file ready for converting. This command

produces a large XML file called example.xml that contains all of the data from the rrd file.

In the second step, an item must be created in Zabbix. It is also possible to use an

existing host and/or item. During this step, it is important to note the host name and item

key used as they will be used in the next steps, and the item must be temporary changed

to a Zabbix trapper item. Based on the housekeeper settings, it might be useful to

increase “Keep trends” value.

Next, a custom script comes into play. To convert the XML files to a format usable by

zabbix_sender, the script is with the following command:

# convertRRD.sh example.xml <HostName> <ItemKey> <DataSourceNumber> outputFile

where example.xml file is an input file from the previous step, hostName is the name of

the Zabbix host and ItemKey is the key of the item, DataSource is the number of the data

source in the RRD file, and outputFile is the name of the temporary file for the next step.

For most cases, DataSource=1 is used, but if there is an RRD file with multiple data

sources, the desired data source must be determined before proceeding. This is done

with the command:

# rrdtool info <example.rrd>

The order the ds[] items appear in is what determines the DataSource number.

In the following step, a temporary file outputFile is copied to the Zabbix server and

imported with the command

# zabbix_sender -z 127.0.0.1 -T -i outputFile

If the import is successful, the Zabbix server will return multiple lines of similar messages:

Info from server: “Processed 50 Failed 0 Total 50 Seconds spent 0.02094″

In the end, the newly-created Zabbix item is converted back to the correct type (e.g.,

Zabbix agent). After the import, Zabbix server can use all of the imported data for trend

prediction and data visualization.

Page 69: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

60

6.4. Zabbix and the Internet of Things

Over the last few years IoT (the Internet of Things) has become an important aspect of the

future of information technology. A Thing, on the Internet of Things, can be any natural or

man-made object that can be assigned an IP address and have the ability to communicate

over a network. Things like machines, objects, appliances, goods, buildings, vehicles,

plants, even people. The Thing must have a unique identity on the Internet, which can be

addressed with IPv6. On a related note, the success of the IoT can be partially addressed

to IPv6’s increase in IP address space even after the IPv4 has been declared as fully

exhausted.

The IoT offers a new way of interacting with the world, where every object is connected to

the Internet and it shares a set of information to the public or only to a selected few.

Although the IoT brings connectivity to many new diverse devices, the one thing keeping

all of them up and online has not changed. Servers and related network devices still have

to be monitored. Zabbix’s small, resource efficient agent can run on any device (a smart

house sensor, vending machine, electronic device or anything else). The agent collects

device performance, availability, status data or any other useful application metrics and it

can communicate this information with other devices or to the cloud.

The number of connected devices will grow rapidly in the following years and decades.

Some of those devices will become much more than some sensor or a simple

measurement tool, they will become indispensable parts of the network with services

relying on them. Proper monitoring can help them stay online and make sure the Internet

of Things can exchange information between them without unnecessary interruptions.

Page 70: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

61

7. Conclusion

No monitoring software can replace years of experience and judgment, but it can make

important data more visible. Zabbix allows the system administrator a chance to gain

insight into server activity, and identify trends, remove bottlenecks, and respond to

potential problems with more speed and precision.

Over the course of their careers, most system administrators, engineers and architects

are confronted with unreliable networks and troublesome network devices. When things

start going the way they shouldn’t, any possible downtime can be reduced or avoided if

the problem is located and resolved in the shortest time possible.

This thesis was focused on establishing a way of monitoring network devices and web

performance using Zabbix - an open-source system monitoring and reporting solution.

Zabbix aims to provide the monitoring of infrastructure, services or processes, all of which

can help the company develop detailed strategies and future objectives of their

infrastructure. In addition, Zabbix can help in terms of capacity planning, organization,

inventory management and service implementation. Zabbix is also very convenient for

Internet of Things projects, thanks to an agent with minimal hardware and software

requirements.

With Zabbix, it is possible to gain control of all the information obtained from the network.

All of the information in Zabbix can and should be re-used in other applications (statistical,

security or inventory) to produce an even greater benefit for the organization. It can

present information not only about the availability and performance of an IT environment,

but also about business metrics, key performance indicators (KPIs), the location of

inventory items, various sensors (e.g., humidity, temperature, proximity or motion) and

many different information that surround us in everyday operations.

With all its features, Zabbix helps the company take full advantage and control of their

equipment, applications and data, achieving positive benefits and gaining advantage over

the competitors.

Page 71: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

62

References

[1] Dalle Vacche, A., Kewan Lee, S. Mastering Zabbix, PACKT Publishing, 2013.

[2] Olups, R. Zabbix 1.8 Network Monitoring, PACKT Publishing, 2010

[3] VIdmar, A. Zabbix – state of the art monitoring, available at

https://www.linux.com/news/zabbix-state-art-network-monitoring?tid=129 [14.09. 2016]

[4] Ramm, M. The watcher knows, available at http://www.linux-mag.com/id/1890/

[14.09. 2016]

[5] Zabbix documentation, Requirements, available at

https://www.zabbix.com/documentation/3.0/manual/installation/requirements [14.09. 2016]

[6] Zabbix product overview, available at http://www.zabbix.com/product.php [14.09.

2016]

[7] Nagios vs. Zabbix review, available at

https://www.itcentralstation.com/product_reviews/zabbix-review-23246-by-arthur-freyman

[14.09. 2016]

[8] Kovacs, K. Zabbix vs Nagios comparison, available at http://kkovacs.eu/zabbix-vs-

nagios [14.09. 2016]

[9] Zabbix documentation, What is Zabbix, available at

https://www.zabbix.com/documentation/3.0/manual/introduction/about [14.09. 2016]

[10] Zabbix life cycle and release policy, available at

http://www.zabbix.com/life_cycle_and_release_policy.php [14.09. 2016]

[11] Zabbix integration guide, available at

https://www.pagerduty.com/docs/guides/zabbix-integration-guide/ [14.09. 2016]

[12] Ansible documentation, Zabbix host, available at

http://docs.ansible.com/ansible/zabbix_host_module.html [14.09. 2016]

Page 72: PORAZDELJEN NADZOR OMREŽJAZabbix, the main focus of this thesis, is an open-source solution written by Alexei Vladishev. The following pages will display the process of installation

63

[13] Zabbix global dashboard, available at http://www.zabbix.com/global_dashboard.php

[14.09. 2016]

[14] cURL command line tool and library, available at https://curl.haxx.se/ [14.09. 2016]

[15] Karsin, B. Importing legacy cacti/MRTG data into Zabbix, available at

http://blog.zabbix.com/importing-legacy-cactimrtg-data-into-zabbix/692/ [14.09. 2016]

[16] Zabbix third party tools, available at http://www.zabbix.com/third_party_tools.php

[14.09. 2016]