icinga 2012 at aconet on 6th tf-noc meeting
DESCRIPTION
Presentation by ACONet, Robert Wein. Original Source: http://www.terena.org/activities/tf-noc/meeting6/programme.htmlTRANSCRIPT
1
Monitoring @ ACOnet
Robert Wein, ACOnet NOC
TF-NOC, Dublin, 2012-06-05
Dienstag, 05. Juni 2012
ACOnet
■ ACOnet is the Austrian NREN, connecting■ (all) Universities & Academies■ Colleges & Research Institutes■ Austrian School Network (edunet), Dormitories■ Museums, educational and cultural institutions■ Hospitals■ Ministries, Federal Agencies■ Federal Chancellery, Presidential Offices■ Provincial Government and Administration■ …
■ Legal Entity & Management: University of Vienna
■ Operation: UniVie + other Universities, fiber backbone by telco
2
Dienstag, 05. Juni 2012
current topology
3
Dienstag, 05. Juni 2012
Vienna Internet Exchange (VIX)
■ neutral and non for profit IXP■ founded 1996■ 107 participants (different AS-Numbers)■ 65 Gbps peak traffic in May 2012■ redundant setup - 2 sites
4
Dienstag, 05. Juni 2012
Monitoring status December 2010
■ Nagios/Cacti■ integration in configuration authority database (ACOnetDB)■ integration in web-portal ■ (intensive) use of check_rrd ■ outsourced maintainance and development - together with
UniVie Campus■ troubles
■ check_rrd takes much IO-load ■ integration of new platform in backbone (Cisco ASR9k)■ lot of CPU load from SNMP on Catalysts due to polling for
values/thresholds _and_ statistics■ outsourced maintainance and development
■ flowsampling to Arbor boxes■ VIX: additional sFlow-sampling, „VIXflow“
5
Dienstag, 05. Juni 2012
6
new monitoring setup
■ Icinga■ Nagios fork■ Developer@ACOnet-Team
■ pnp4nagios■ takes perfdata and puts it into rrds
■ check_mk■ keeping inventory■ generates Icinga-config ■ one active check for one device■ python - just a small job to write your own checks :)
Dienstag, 05. Juni 2012
7
Monitoring@ACOnet■ integration
■ ACOnet Database/VIX Database■ configuration authority■ dispatcher writes dictionaries for check_mk and calls
check_mk to generate the config■ display of statistics in portal (per participant)■ weathermap (standalone php)■ display of relevant status data/checkresults in portal
Dienstag, 05. Juni 2012
8
Monitoring@ACOnet
■ characteristics■ one active check per device■ results used in many passive checks■ SNMPv2 (except older power-measurement-devices)■ no traps■ perfdata in RRDs■ OID cache■ SNMPv2 bulkwalks■ ido2db - postgresql■ one poll for statistics and threshold decision■ use of rrdcached speeds up the whole thing■ Icinga classic UI■ two monitoring hosts at different locations■ dedicated hardware for monitoring
■ commodity HP hardware
Dienstag, 05. Juni 2012
9
Monitoring@ACOnet
Dienstag, 05. Juni 2012
10
Monitoring@ACOnet
■ what do we check/graph■ traffic/packets/errors/discards■ CoS (QoS) - basis for cost sharing model■ module status■ BGP
■ incl. Prefix count■ @Cisco ASR9K also IPv6
■ ICMP RTT in v4 and v6■ Memory/CPU usage■ temperatures■ DOM■ .....■ @VIX
■ power consumption (for billing of RUs)■ bird BGP-daemon■ special: Proxy ARP check
Dienstag, 05. Juni 2012
11
Monitoring@ACOnet
■ Enhancements■ ASR9k integrated■ checks and statistics in <45 s per Device
■ check latency >200s when using Cacti/Nagios■ less CPU consumed from SNMP on monitored devices■ Load@montoring host between 0,3 and 0,9
■ compared to 5 (nagios/cacti)■ VIX routeserver (bird) monitoring established■ reduced IO-load due to rrdcached■ easy (?) implementation of new checks■ advantages of Icinga
■ active development■ eg., flexible downtime, multiple acknowledgements, .....■ easy bringing in of new ideas :)
Dienstag, 05. Juni 2012
12
Monitoring@ACOnet
■ Future■ dependencies■ better grained notifications■ weathermap redesign
Dienstag, 05. Juni 2012
13
Monitoring@ACOnet
Dienstag, 05. Juni 2012
14
Monitoring@ACOnet
Questions?
Dienstag, 05. Juni 2012