Monday, June 27, 2016

Observium performance tuning

I noticed last week that all our Cisco ASR 9000-routers have a bug in IOS-XR 4.3.4 which cause all the snmp queries of LLDP data to be super slow.
Almost 70% of the total time it took to run Observium discovery was just the simple snmpwalk of LLDP-data and while the routers where busy processing this query it did not respond to other snmp queries.
This resulted in a lot of gaps in our graphs on these routers every 6 hours when the discovery-script was running. It was a pretty easy fix, as soon as we upgraded IOS-XR to a newer version the routers started to respond much faster.
But this got me thinking; which other SNMP-queries are super slow?

So I hacked together a small script that will parse the logfile of Observium poller or discovery-script and list all SNMP-commands that Observium runs and sort them by runtime.


Simply edit your cronjob to have Observium discovery-script to create a logfile for you like this:
33  */6   * * *   root    /opt/observium/discovery.php -d -h all > /opt/observium/logs/discovery_log
and then wait for the discovery-script to complete its discovery of your network. Then run observium-logparser.py which will look in /opt/observium/logs/discovery_log default and produce a top 10-list of the slowest SNMP-queries in your network.

For the poller-script you need to do some manual magic first if you use the poller-wrapper.
First run the poller-wrapper with the debug-flag
sudo /opt/observium/poller-wrapper.py -d
It will now produce a lot of debug-files in your /tmp/-directory, you can put them all in the same file by running:
 cat /tmp/observium_poller_* > poller_debug.txt
Now you have the poller_debug.txt-file which is the debug-log of all the pollers, simply run observium-logparser and use the --logfile flag to point at the poller debug as in the picture above.

The script can be downloaded from here: https://github.com/ZerxXxes/observium-logparser

Hope this help you find unnecessary slow devices in your network!


Wednesday, June 8, 2016

Integrating Slack with Observium and testing alerts

Observium can fairly easily be configured to send alerts to Slack.
Start by setting up an incoming WebHook in Slack if you don't already have one.
When you have created your new WebHook make sure to notice the Webhook URL, you will need it in the next step.

Now login to Observium, from the menu go to Contacts and click on Add Contact.
Choose Slack as Method and then proceed with filling the form with description info and in the field Instance URL you paste the Webhook URL you got from Slack, then hit the Add Contact-button.
When you are done you will be back at the Contacts-page, now click on your new contact again to edit it. In the Contact settings-page you will see all your configured Observium-alerts in the box to the right. Choose which alerts you want to send alerts to Slack and click Associate.
Congratulations, you are done. As soon as an Observium Alert triggers it will now send information about it to Slack via the WebHook you created.

Testing your alert-checker

You can test your new alerting by using a script in Observium that is called test_alert.php, note that you need CLI-access to the Observium-box for this, its not possible test alerts via the web interface yet.
Start by listing you alert checkers, in the main menu in the web interface, click on Alert Checks.
Here choose an alert check you want to test, preferably on associated with a contact like the Slack Webhook we created earlier.
Click on the alert check and you will be at the settingspage of that alert checker.
Here you will see all entities tied to the alert checker.
In my case I have an alert checker that monitors all my interfaces that are 10G or greater and alerts if they reach more than 80% utilization. The list of entities in this case is therefore all the interfaces that is monitored.
Next to every entity there is a small "i" that you can click on. Choose one entity and click on the "i", it will take you to the statuspage of this specific alert entity.
Here we need to find the alert entity number which will be in the URL when you are at this status page. Look in the browser URL for: alert_entry= and copy the number.
Now login to the observium-CLI, go to you observium directory and run the test_alert.php-script.
The script needs the alert_entry-number to know which alert to test, this is passed with the -a flag.
So simply run the script like this: ./test_alert.php -a <entry-number>
The script will now check the alert and send alerts to all associated contacts like the Slack webhook we created earlier which gives us confirmation that it works as intended!