Thursday, October 23, 2014

Using rrdcached with Observium

rrdcached is a daemon that receives updates to existing RRD files, accumulates them and, if enough have been received or a defined time has passed, writes the updates to the RRD file.
This can be very useful in a bigger Observium-instance as the number of polled interfaces grow and the number of RRD-files that will be updated each polling is increasing, soon each polling will be generating a lot of random writes to your storage.
rrdcached gives us the possibility to trade some of that IO for memory. This is can be very good deal for virtual machines or any server with a slower storage but a fair amount of memory.

This small guide is written and tested on Ubuntu Linux and I cant guarantee that it will work on any other distro.

1. Start by installing rrdcached using your favorite packet manager
#  sudo apt-get update && sudo apt-get install rrdcached

2. The rrdcached daemon will be started automatically so we need to stop it.
# sudo service rrdcached stop
Stopping RRDtool data caching daemon: rrdcached.

3. Edit the start up options for the rrdcached daemon
# sudo nano /etc/default/rrdcached

Find the row in the configfile with #OPTS= and replace them with these options:
OPTS="-w 1800 -z 1800 -f 3600 -s www-data -l unix:/var/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /opt/observium/rrd/ -B"
Dont forget to remove the bracket!

The values used here defines the following:

  • -w 1800 Wait 1800s (30min) before writing data
  • -z 1800 Delay writes by a random factor of up to 30 minutes (this should be equal to, or lower than, “-w”)
  • -f 3600 Flush all data every 3600s (1 hour)
  • -s set the group owner of the socket to www-data (this needs to be set before -l)
  • -l path to our socket that observium will talk to
  • -j path to journaling files
  • -F ALWAYS flush all updates to the RRD data files when the daemon is shut down
  • -b path to observium RRD-files
  • -B Only permit writes into the base directory specified in -b (and any sub-directories).
4. Now start rrdcached daemon again
# sudo service rrdcached start
5.  Now here comes the big problem, as rrdached is started as root it will create the socket-file (/var/run/rrdcached.sock) with root permissions. But the webserver user need to be able to both read and write to this socket so we will need to change the owner to the webserver user (www-data in this case)
# sudo chown www-data:www-data /var/run/rrdcached.sock
This needs to be redone every time rrdcached is restarted as it will then recreate this file.
This probably can be solved by manipulating the init-script or some other cleaver way, suggestions are highly appreciated! There are also flags in rrdcached for this (-m and -s) but I never got it to work, file owner was always root.
*NOTE* Step 5 no longer needed when using -s flag properly!

6. Last step, edit the observium config.php to use our rrdcached socket instead of writing RRDs directly.
# sudo nano /opt/observium/config.php
Then add this line to the config
 $config['rrdcached']    = "unix:/var/run/rrdcached.sock";
Save and exit! Now log in to Observium, your graphs should look as they use to. If you have a pretty big instance you should soon notice that the storage IO has decreased by a lot.

If all your graphs show "draw error" then you webserver user probably dont have read and write-access to /var/run/rrdcached.sock

This is the IO of my observium-machine after installing rrdcached:

Weathermap

if you are using the php-weathermap with Observium then you will also need to edit the weathermap file to inclued:
putenv ("RRDCACHED_ADDRESS=/var/run/rrdcached.sock");

7 comments:

  1. Thanks this really worked! My polling finished over 5 times faster.

    Before: "INFO: poller-wrapper polled 312 devices in 1142 seconds with 20 workers"

    After: "INFO: poller-wrapper polled 312 devices in 198 seconds with 20 workers"

    ReplyDelete
  2. I found out that the 198 seconds was caused by a single problem network device. My discoveries are actually completing in 51 seconds!

    ReplyDelete
  3. Problem: Very slow observium when rendering graphs. Top command show rrd spike from 2% to 36% CPU useage when graphs accessed.
    Solution: Put rrd on 4GB ramdisk.
    Result: Little to no improvement.
    Setup: Virtualized diabian 7 on win server 2012 hyper v. VM has 10 GB ram (4+6).
    Monitored devices: 18
    Question: Do you know what else I could do to speed perf (I did not do rrdcache because I read that ramdisk perfopt is faster)?

    ReplyDelete
  4. Problem: Very slow observium when rendering graphs. Top command show rrd spike from 2% to 36% CPU useage when graphs accessed.
    Solution: Put rrd on 4GB ramdisk.
    Result: Little to no improvement.
    Setup: Virtualized diabian 7 on win server 2012 hyper v. VM has 10 GB ram (4+6).
    Monitored devices: 18
    Question: Do you know what else I could do to speed perf (I did not do rrdcache because I read that ramdisk perfopt is faster)?

    ReplyDelete
  5. This seems to have caused issues with the Weathermap after I did this. I have links now that show 0% utilization even though they're not.

    ReplyDelete
  6. Is it working well with Weathermap?

    ReplyDelete
    Replies
    1. Yes, but you need to edit the weathermap-file to include:
      putenv ("RRDCACHED_ADDRESS=/var/run/rrdcached.sock");

      Delete