Monday, June 27, 2016

Observium performance tuning

I noticed last week that all our Cisco ASR 9000-routers have a bug in IOS-XR 4.3.4 which cause all the snmp queries of LLDP data to be super slow.
Almost 70% of the total time it took to run Observium discovery was just the simple snmpwalk of LLDP-data and while the routers where busy processing this query it did not respond to other snmp queries.
This resulted in a lot of gaps in our graphs on these routers every 6 hours when the discovery-script was running. It was a pretty easy fix, as soon as we upgraded IOS-XR to a newer version the routers started to respond much faster.
But this got me thinking; which other SNMP-queries are super slow?

So I hacked together a small script that will parse the logfile of Observium poller or discovery-script and list all SNMP-commands that Observium runs and sort them by runtime.


Simply edit your cronjob to have Observium discovery-script to create a logfile for you like this:
33  */6   * * *   root    /opt/observium/discovery.php -d -h all > /opt/observium/logs/discovery_log
and then wait for the discovery-script to complete its discovery of your network. Then run observium-logparser.py which will look in /opt/observium/logs/discovery_log default and produce a top 10-list of the slowest SNMP-queries in your network.

For the poller-script you need to do some manual magic first if you use the poller-wrapper.
First run the poller-wrapper with the debug-flag
sudo /opt/observium/poller-wrapper.py -d
It will now produce a lot of debug-files in your /tmp/-directory, you can put them all in the same file by running:
 cat /tmp/observium_poller_* > poller_debug.txt
Now you have the poller_debug.txt-file which is the debug-log of all the pollers, simply run observium-logparser and use the --logfile flag to point at the poller debug as in the picture above.

The script can be downloaded from here: https://github.com/ZerxXxes/observium-logparser

Hope this help you find unnecessary slow devices in your network!


2 comments: