Almost 70% of the total time it took to run Observium discovery was just the simple snmpwalk of LLDP-data and while the routers where busy processing this query it did not respond to other snmp queries.
This resulted in a lot of gaps in our graphs on these routers every 6 hours when the discovery-script was running. It was a pretty easy fix, as soon as we upgraded IOS-XR to a newer version the routers started to respond much faster.
But this got me thinking; which other SNMP-queries are super slow?
So I hacked together a small script that will parse the logfile of Observium poller or discovery-script and list all SNMP-commands that Observium runs and sort them by runtime.
Simply edit your cronjob to have Observium discovery-script to create a logfile for you like this:
33 */6 * * * root /opt/observium/discovery.php -d -h all > /opt/observium/logs/discovery_logand then wait for the discovery-script to complete its discovery of your network. Then run observium-logparser.py which will look in /opt/observium/logs/discovery_log default and produce a top 10-list of the slowest SNMP-queries in your network.
For the poller-script you need to do some manual magic first if you use the poller-wrapper.
First run the poller-wrapper with the debug-flag
sudo /opt/observium/poller-wrapper.py -dIt will now produce a lot of debug-files in your /tmp/-directory, you can put them all in the same file by running:
cat /tmp/observium_poller_* > poller_debug.txtNow you have the poller_debug.txt-file which is the debug-log of all the pollers, simply run observium-logparser and use the --logfile flag to point at the poller debug as in the picture above.
The script can be downloaded from here: https://github.com/ZerxXxes/observium-logparser
Hope this help you find unnecessary slow devices in your network!