Tuesday, October 3, 2017

Scaling Observium horizontally

EXPERIMENTAL FEATURE! THERE ARE NO KNOWN LARGE INSTALLATIONS USING THIS SETUP IN PRODUCTION YET. IF YOU BUILD THIS KIND OF SETUP IN PRODUCTION PLEASE CONTACT OBSERVIUM DEVELOPERS WITH YOUR PERFORMANCE EXPERIENCE AND ANY BUGS THAT YOU MAY RUN IN TO.

For very large installations (1000+ devices) it can be hard to make a single server fast enough to poll all your devices in 5 minutes. Luckily you can break out the separate functions of Observium on different servers to make it scale horizontally, you will then be able to just add more servers to your pool of poller-servers when you need more polling-power. This quick guide will show you how to set it up.

Requirements

  • Observium Pro Edition
  • rrdtool 1.6.0 or greater
  • php 7.x
  • a fast dedicated MySQL-server
  • Very fast storage for RRD-files, SSDs highly recommended



MySQL-Server

Start by installing a MySQL-server on a dedicated server. If you installation will be very large its recommended to tweak the settings after the database have run for 48h as there sure is a lot of tuning that can be done. You could also use a MariaDB or Percona Server if you'd prefer or even a Amazon Aurora database if you would build this in AWS.

After installing the database you should make it listen for querys over the network interface. This is done by adding the following config to your config-file:
bind-address=<ip-of-your-server>
Or the corresponding configuration for any other database-server you choose.
Make sure the firewall allows connections to port 3306 for MySQL, you now have a database-server ready to serve over the network.

RRD-Server

Its now time to install the main Observium-server. This server will serve as RRD-storage and will receive RRD-data from all your pollers that it will need to write to disk. This will put the storage of this server under huge load and the bottleneck of you entire installation will probably boil down to how fast this server is capable of writing to disk.
Therefore dedicated SSD-drives for this server is highly recommended. You should also make sure to not use consumer-grade SSDs as the amount of data this server will write will probably make a consumer-SSD wear out in a couple of months.
Chose a pair of heavy duty enterprise class-SSDs and preferably put them in a RAID1 for resilience.

Then you can proceed to install this server as a standard Observium server, just fallow the installation instructions on www.observium.org expect for two details.
First, skip the part where you install a mysql-server. This server will not run mysql at all. When its time to create the mysql-user then instead do this on your MySQL-server and then make sure to change the database settings in config.php before you run discovery.php for the first time to:
$config['db_host']      = '<you-mysql-server-ip>'
Secondly, skip the part where you add discovery.php and poller-wrapper.py to cron, the only thing this server should run by cron is housekeeping.php

rrdcached

When your main Observium server is installed then its time to install rrdcached. rrdcached will serve as the interface for RRD-writes for all your pollers. It will receive RRD-data over the network from all the poller-machines, cache the data to be written in memory and then write it in larger bulks to disk. This will save your storage of the worst I/O bursts at the same time as it gives your pollers a simple way of sending the data over the network.
Follow my other guide for how to setup rrdcached here: http://blog.best-practice.se/2014/10/using-rrdcached-with-observium.html
After the installation is done you will need to add a few more flags to the rrdcached config, first make sure that you have the flags:
-BRO
These flags make sure its only possible to write to rrd-files in the directory you assigned and it also ignores any attempt to overwrite existing rrd-files with the create-command.
Next add the flag:
-L
This will make rrdcached listen to all network interfaces on the default port (42217)
Make sure the firewall allows connections on this port and then your RRD-server is ready to go.
Also make sure this machine is configured to use rrdcached in config.php

Poller-Servers

The poller-servers will be the machines actually doing the polling of SNMP-devices. You can have as many poller-servers as you need and you can also add more poller-servers later when you need to scale up.
For every poller-server install a standard Observium installation but without the Apache server and MySQL-part and then just as on the RRD-server make sure to change the database-setting in config.php to:
$config['db_host']      = '<you-mysql-server-ip>'
Then proceed to add the following two lines to config.php:
$config['rrdcached']    = "<your-rrd-server-ip";
$config['rrd']['no_local'] = TRUE;
This will tell Observium that there are no local RRDs on this installation and where to find the rrdcached-server to write all the RRDs to.
Next we edit the cronjob, delete all the housekeeping.php-jobs as this will be done by the RRD-server itself and then add the two flags -i and -n. The -i flag tells Observium how many poller-servers you are running and the -n flag tells it which of them this server is (Note that this number starts from 0).
So for example if you run 3 poller-servers your first server will have this cronjob:
*/5 *     * * *   root    /opt/observium/poller-wrapper.py -i 3 -n 0  >> /dev/null 2>&1
33  */6   * * *   root    /opt/observium/discovery.php -h all -i 3 -n 0 >> /dev/null 2>&1 
The next poller-server will have everything the same but -n 1 instead.

The last discovery-job that only discovers new devices (discovery.php -h new) can not be split on multiple pollers but this is a very tiny job that finish fast so just put this job on the first of your poller-servers and remove it from the other poller-servers.
Thats it! Your poller-servers should now start fetching the devicelist from the database, poll their respective part of the device list and then feed the results and rrd-data over the network back to your servers.
If you visit the "Polling Information" tab in Observium you should now see that you have a number of separate Wrapper Processes in the graph.

Updating

Be very careful when you update your installation. With many different processes all writing to the database at the same time its very important that all the processes are the same version.
Make sure that you update Observium on all your machines at the same time and that only on of them runs ./discovery.php -u directly afterwards so that the database is correctly updated.
There might even be a good idea to stop all cronjobs before updating to be on the safe side.

Optimizing

If you run a large enough install that you need this then do not forget to check out all the performance tuning that can be done in Observium: http://docs.observium.org/tuning/
PHP7 is a reuqirement as this gives a huge performance boost and also make sure the opcode caching is enabled in CLI.
As the database grow with a lot of devices and ports the web interface will soon be pretty slow so also make sure that you enable the fast userspace caching.
If your install is used by a lot of users then it might be nice to switch out the Apache webserver for nginx and enable HTTP/2-support as this will load resources in the web interface much faster.
You could also experiment with the -t flag on rrdcached. This sets the amount of write-threads that rrdcached uses, default is 4. Increasing this might improve disk write performance.

Installscript

As installing a lot of Observium-instances can become tiresome I decided to write a small shell-script that automates the process for you.
It works well with Ubuntu 16 and Observium Pro or CE. Just download the script to the server, make it executable and then run it.

Monday, June 27, 2016

Observium performance tuning

I noticed last week that all our Cisco ASR 9000-routers have a bug in IOS-XR 4.3.4 which cause all the snmp queries of LLDP data to be super slow.
Almost 70% of the total time it took to run Observium discovery was just the simple snmpwalk of LLDP-data and while the routers where busy processing this query it did not respond to other snmp queries.
This resulted in a lot of gaps in our graphs on these routers every 6 hours when the discovery-script was running. It was a pretty easy fix, as soon as we upgraded IOS-XR to a newer version the routers started to respond much faster.
But this got me thinking; which other SNMP-queries are super slow?

So I hacked together a small script that will parse the logfile of Observium poller or discovery-script and list all SNMP-commands that Observium runs and sort them by runtime.


Simply edit your cronjob to have Observium discovery-script to create a logfile for you like this:
33  */6   * * *   root    /opt/observium/discovery.php -d -h all > /opt/observium/logs/discovery_log
and then wait for the discovery-script to complete its discovery of your network. Then run observium-logparser.py which will look in /opt/observium/logs/discovery_log default and produce a top 10-list of the slowest SNMP-queries in your network.

For the poller-script you need to do some manual magic first if you use the poller-wrapper.
First run the poller-wrapper with the debug-flag
sudo /opt/observium/poller-wrapper.py -d
It will now produce a lot of debug-files in your /tmp/-directory, you can put them all in the same file by running:
 cat /tmp/observium_poller_* > poller_debug.txt
Now you have the poller_debug.txt-file which is the debug-log of all the pollers, simply run observium-logparser and use the --logfile flag to point at the poller debug as in the picture above.

The script can be downloaded from here: https://github.com/ZerxXxes/observium-logparser

Hope this help you find unnecessary slow devices in your network!


Wednesday, June 8, 2016

Integrating Slack with Observium and testing alerts

Observium can fairly easily be configured to send alerts to Slack.
Start by setting up an incoming WebHook in Slack if you don't already have one.
When you have created your new WebHook make sure to notice the Webhook URL, you will need it in the next step.

Now login to Observium, from the menu go to Contacts and click on Add Contact.
Choose Slack as Method and then proceed with filling the form with description info and in the field Instance URL you paste the Webhook URL you got from Slack, then hit the Add Contact-button.
When you are done you will be back at the Contacts-page, now click on your new contact again to edit it. In the Contact settings-page you will see all your configured Observium-alerts in the box to the right. Choose which alerts you want to send alerts to Slack and click Associate.
Congratulations, you are done. As soon as an Observium Alert triggers it will now send information about it to Slack via the WebHook you created.

Testing your alert-checker

You can test your new alerting by using a script in Observium that is called test_alert.php, note that you need CLI-access to the Observium-box for this, its not possible test alerts via the web interface yet.
Start by listing you alert checkers, in the main menu in the web interface, click on Alert Checks.
Here choose an alert check you want to test, preferably on associated with a contact like the Slack Webhook we created earlier.
Click on the alert check and you will be at the settingspage of that alert checker.
Here you will see all entities tied to the alert checker.
In my case I have an alert checker that monitors all my interfaces that are 10G or greater and alerts if they reach more than 80% utilization. The list of entities in this case is therefore all the interfaces that is monitored.
Next to every entity there is a small "i" that you can click on. Choose one entity and click on the "i", it will take you to the statuspage of this specific alert entity.
Here we need to find the alert entity number which will be in the URL when you are at this status page. Look in the browser URL for: alert_entry= and copy the number.
Now login to the observium-CLI, go to you observium directory and run the test_alert.php-script.
The script needs the alert_entry-number to know which alert to test, this is passed with the -a flag.
So simply run the script like this: ./test_alert.php -a <entry-number>
The script will now check the alert and send alerts to all associated contacts like the Slack webhook we created earlier which gives us confirmation that it works as intended!

Monday, December 14, 2015

Vastly improve Observium performance with PHP 7

This is a quick guide on how to install PHP 7 in Ubuntu and have it work with Observium to greatly speed things up. Note that this will only be useful if you have a very large Observium-installation and start to notice the web interface slowing down.

PHP 7.0.0 released a couple of weeks ago and promised up to double the performance of PHP 5.
As I run a very large Observium installation with over 1100 devices that are being monitored I had a very slow web interface as a result of that. When I heard about the big performance improvements in PHP7 I decided to try it out.

The result was even better than I expected! Most of the pages in the web UI needed about 5s to render and some even needed over 7s with PHP5. After switching to PHP7 my UI now loads most pages in less than 1.5s and the most heavy pages renders in about 2s.
PHP7 proved to perform 2 to 3 times as fast as PHP5!

Install Observium with PHP 7

To install Observium with PHP7 you need to change a few step from the install guide.
Note that this only works for Observium Pro for now, Observium CE wont work with PHP7 yet!

This is based on the Ubuntu 14-guide and assumes you have a freshly installed Ubuntu 14.
First we need to add the repository for PHP7:
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:ondrej/php-7.0
 The update and install php7
sudo apt-get update
sudo apt-get install php7.0 php7.0-mysql php7.0-mcrypt libapache2-mod-php7.0
And lastly we install the rest of the required observium packages
sudo apt-get install php-pear snmp fping mysql-server mysql-client python-mysqldb rrdtool subversion whois mtr-tiny ipmitool graphviz imagemagick
now we can continue to follow the guide for Observium Pro. The only important thing to look for is that you make sure your config.php contains this option:
$config['db_extension'] = 'mysqli';
before you try to run the update.php-script.

When the installation is complete you can visit the About Observium-page and you will se that Observim runs on PHP7

My Observium is polling slowly, will PHP7 help me?

no, polling is almost only dependent on disk IO and the SNMP response speed of the device you poll, PHP7 will do very little difference to the pollingtime.

Does Observium CE work with PHP7?

no, the current CE does not support PHP7, you need to have Pro.

Is Observium fully compatible with PHP7?

I have tried to test as many of the features in Observium that I could, and so far everything seems to work just fine. I have not tried the billing-module however.
Always try this in a test-install first, as there may be some features that are not fully compatible.

Saturday, March 14, 2015

Upgrade vCenter 5.5 to 6.0

This week VMware released vSphere 6.0 and vCenter 6.0 in which one of the big news i a much faster and responsive web client.
The install process of vCenter 6.0 is not completely obvious as its not an OVA-file to deploy so I decided to write this quick guide on how to upgrade vCenter 5.5 to 6.0

1. Download the ISO

go to the vmware download page (https://my.vmware.com/web/vmware/details?downloadGroup=VC600&productId=491&rPId=7501) and fetch the ISO file called VMware vCenter Server Appliance and download it to your computer.

2. Mount/Extract the ISO

Next step is to mount the ISO. If you are running Windows 8 you can just right click it and chose "mount" to have it mounted to a virtual cd-drive. Else you can use WinRAR or 7zip to extract the files from the ISO.

3. Upgrade the VMware Client Integration Plugin

On the ISO, find the folder named "vcsa" and inside it you find an installer for upgrading the VMware browser plugin on your computer to 6.0. Remember to close all open browsers first, then run the installer.

4. Start the upgrade

When the browser plugin is upgraded go back to the ISO, in the root you find a file called vcsa-setup.html, open it with your browser and chose "Upgrade".
Accept the EULA and countine.
The first info the installer needs is the target server and credentials. Note that this is the target ESXi-host for where you want to install your vCenter 6.0 VM, NOT your current vcenter server.

5. Source Appliance

The next step is to chose a name for you new vcenter server and then go ahead and enter all the login info for your old vcenter server and the esxi-host that it is currently deployed on.
When you have entered all the login details you will have to chose the size of your vcenter installation (how many hosts and VMs it should be able to manage) and what datastore to put it in.
The last step will be to assign an IP address to the new vcenter installation from which it can reach your old vcenter. Note that this address will only be used temporary, as soon as the install is complete your old vcenter will be deleted and the new install will start using the same address as your old install.

This was the last step, now your upgrade will begin, this process takes a fairly long time, atleast 15min+ so have patience.
As soon as the upgrade is complete you then can go ahead and login to your new vcenter 6.0 via the new fast web client.
Hope you had some help from this guide.

Thursday, October 23, 2014

Using rrdcached with Observium

rrdcached is a daemon that receives updates to existing RRD files, accumulates them and, if enough have been received or a defined time has passed, writes the updates to the RRD file.
This can be very useful in a bigger Observium-instance as the number of polled interfaces grow and the number of RRD-files that will be updated each polling is increasing, soon each polling will be generating a lot of random writes to your storage.
rrdcached gives us the possibility to trade some of that IO for memory. This is can be very good deal for virtual machines or any server with a slower storage but a fair amount of memory.

This small guide is written and tested on Ubuntu Linux and I cant guarantee that it will work on any other distro.

1. Start by installing rrdcached using your favorite packet manager
#  sudo apt-get update && sudo apt-get install rrdcached

2. The rrdcached daemon will be started automatically so we need to stop it.
# sudo service rrdcached stop
Stopping RRDtool data caching daemon: rrdcached.

3. Edit the start up options for the rrdcached daemon
# sudo nano /etc/default/rrdcached

Find the row in the configfile with #OPTS= and replace them with these options:
OPTS="-w 1800 -z 1800 -f 3600 -s www-data -l unix:/var/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /opt/observium/rrd/ -B"
Dont forget to remove the bracket!

The values used here defines the following:

  • -w 1800 Wait 1800s (30min) before writing data
  • -z 1800 Delay writes by a random factor of up to 30 minutes (this should be equal to, or lower than, “-w”)
  • -f 3600 Flush all data every 3600s (1 hour)
  • -s set the group owner of the socket to www-data (this needs to be set before -l)
  • -l path to our socket that observium will talk to
  • -j path to journaling files
  • -F ALWAYS flush all updates to the RRD data files when the daemon is shut down
  • -b path to observium RRD-files
  • -B Only permit writes into the base directory specified in -b (and any sub-directories).
4. Now start rrdcached daemon again
# sudo service rrdcached start
5.  Now here comes the big problem, as rrdached is started as root it will create the socket-file (/var/run/rrdcached.sock) with root permissions. But the webserver user need to be able to both read and write to this socket so we will need to change the owner to the webserver user (www-data in this case)
# sudo chown www-data:www-data /var/run/rrdcached.sock
This needs to be redone every time rrdcached is restarted as it will then recreate this file.
This probably can be solved by manipulating the init-script or some other cleaver way, suggestions are highly appreciated! There are also flags in rrdcached for this (-m and -s) but I never got it to work, file owner was always root.
*NOTE* Step 5 no longer needed when using -s flag properly!

6. Last step, edit the observium config.php to use our rrdcached socket instead of writing RRDs directly.
# sudo nano /opt/observium/config.php
Then add this line to the config
 $config['rrdcached']    = "unix:/var/run/rrdcached.sock";
Save and exit! Now log in to Observium, your graphs should look as they use to. If you have a pretty big instance you should soon notice that the storage IO has decreased by a lot.

If all your graphs show "draw error" then you webserver user probably dont have read and write-access to /var/run/rrdcached.sock

This is the IO of my observium-machine after installing rrdcached:

Weathermap

if you are using the php-weathermap with Observium then you will also need to edit the weathermap file to inclued:
putenv ("RRDCACHED_ADDRESS=/var/run/rrdcached.sock");

Tuesday, July 15, 2014

Using PHP Weathermap with Observium

The PHP Weathermap plugin is a very popular tool for mapping the link load of a network environment. It is usually used as a plugin to Cacti or MRTG but as Observium is gaining popularity I decided to make a quick guide for how you get the weathermap nicely integrated with Observium in a way that does not break when updating to newer versions of Observium.

This guide assumes that you have a working installation of Observium already, preferably a installation that was done using the Debian/Ubuntu-guide.

1. Download the modified version of PHP Weathermap from github github: https://github.com/ZerxXxes/weathermap-for-observium and put it in your observium/html-directory.
The easiest way to do this is using git:
cd /opt/observium/html/
git clone https://github.com/ZerxXxes/weathermap-for-observium.git weathermap
2. If you have used different paths for observium or the weathermap-plugin you will need to edit the variables in data-pick.php and map-poller.php, if you are following the installation guide then the default variables will work.

3. open the file editor.php and change the value in the beginning to:
 $ENABLED=true
As long as this value is true everyone who knows the right URL will be able to access the weathermap editor, its therefore recommended to change this value back to false as soon as you are done editing.

4. Make sure the directory configs/ is writable by your webserver, one way is to change to owner of the directory to the webserver-user: (in Ubuntu the webserver user is usually called www-data)
cd weathermap/
chown www-data:www-data configs/
5. Create a new directory called maps/ and make the webserver-user the owner
mkdir maps/
chown www-data:www-data maps/
6. Now use your webbrowser and access the editor in weathermap/editor.php (i.e. surf to observium.myurl.com/weathermap/editor.php)

7. Create a new map by writing a name and click create map. Note that the map name *must* end with .conf (i.e. networkmap.conf)
Edit your map, create nodes and draw links and then pick graphs from Observium to use with the links.
*NOTE* Under Map Properties, make sure to define Output HTML Filename to maps/<mapname>.html and Output Image Filename to <mapname>.png




8. Make the file map-poller.php executable for your system by doing:
chmod +x map-poller.php
9. Add a new line in the cronjob at /etc/cron.d/observium after the Observium polling and discovery:
*/5 * * * * root /opt/observium/html/weathermap/map-poller.php >> /dev/null 2>&1
10. Move the file navbar-custom.inc.php in the observium/html/includes/-directory. This file does not exist by default in Observium but Observium looks for this file and include the code from it if it exists, this makes it possible to add custom menus that does not break when you upgrade you Observium installation.
mv navbar-custom.inc.php /opt/observium/html/includes/navbar-custom.inc.php
*NOTE*
 If you are using the current community edition (based on revision 5229) or any revision older than 5670 you should instead use the file navbar-custom-old.inc.php and rename it.


Now that's it!
All .html-files in the maps/-directory will be linked in a sub menu in the Observium GUI like this:


Clicking on one of them will take you to the rendered weathermap where you will get a nice overview of your networks load.
Hovering the mouse over a link will show the Observium graph for that link like this:


And clicking on a link will take you to the Observium-page for that link.

Hope this guide has been helpful for you and thank you for reading!