[TriLUG] Open source software to monitor hundreds of VMs

Chris Baldwin via TriLUG trilug at trilug.org
Mon Sep 12 10:54:30 EDT 2016


I've done Nagios + Puppet + mcollective & mnrpes at my last job. All open source.

Nagios v3 because, at the time, most other projects were resumed Nagios with some plugins that didn't help us much. That might be different now, i haven't looked closely.

Puppet for configuration. Bringing up a new server (or 20, 50, 100 new servers) and having puppet take care of all (app & monitoring) the config is awesome. We abused exported resources, which created some amusing performance issues for puppet on the Nagios box.

Mcollective + mnrpes was done for running passive checks on all out servers. We ran in to scalability issues with Nagios' scheduler, and giving it more resources constantly wasn't a good option. We didn't want to spawn workers either, as that is no different than giving it more resources. Mnrpes works by having a static scheduler that sends an nrpe command out over mcollective every X seconds/minutes, then listens for the response and outs it in Nagios' fifo queue. This let us have 9k checks running every minute, with the Nagios box being a 2 core/4 gb ram VM.

I should mention that his is being redesigned by the team I left there, but the premise held up fairly well for 3 years.

-Chris

> On Sep 12, 2016, at 10:23 AM, Aaron Joyner via TriLUG <trilug at trilug.org> wrote:
> 
> On the open source front, I'd recommend checking out Prometheus (
> https://prometheus.io/) -- it's as close as you can get to Borgmon.  I
> haven't used it personally for anything, but if I were in your shoes it's
> what I'd try.  There are lots of write-ups about it online, but as a start
> here's the 16 minute lightning talk by Brain Brazil, the Founder, giving
> the overview:
> https://www.youtube.com/watch?v=cwRmXqXKGtk
> 
> Aaron S. Joyner
> 
> 
> On Sat, Sep 10, 2016 at 11:38 PM, Matt Pusateri via TriLUG <
> trilug at trilug.org> wrote:
> 
>> The one think I didn’t like about Zenoss, was that it stores data in an
>> embedded database that’s hard to backup and deal with.  Upgrades were a
>> pain.  Simple UI, but a pain to manage long term.
>> 
>> Also I’d be remiss if I didn’t give you the Monitoring Sucks github page
>> would tracks a bunch of monitoring software. https://github.com/
>> monitoringsucks/
>> 
>> 
>> One thing you probably do want is a script that monitors your vm’s and
>> fixes the parent links. This way when vm’s migrate hosts, your network map
>> stays accurate.
>> 
>> 
>> Matt P.
>> 
>>>> On Sep 10, 2016, at 9:43 PM, David Brain via TriLUG <trilug at trilug.org>
>>> wrote:
>>> 
>>> Just throwing in another option - Zenoss https://www.zenoss.org/ -
>>> have used it in the past pretty simple to get up and running.
>>> 
>>> David.
>>> 
>>> 
>>> 
>>> On Sat, Sep 10, 2016 at 9:10 PM, Craig Cook via TriLUG
>>> <trilug at trilug.org> wrote:
>>>>> Looking for some suggestions on Open Source software to monitor
>> hundreds of VMs and/or containers (99% Linux [Ubuntu/CentOS]).  Currently
>> using Cacti, but it is very tedious to setup a new VM.  Would like to get
>> the “typical” stats (CPU, RAM, HDD, Network, SWAP, etc).  GUI would be nice
>> as well as an API to add new VMs and extract data.
>>>> 
>>>> 
>>>> For VM monitoring I recommend Check_mk RAW.  Configuring Nagios is
>> *painful*.  Check_mk uses nagios as a backend and makes configuring it
>> easy.  Includes integrated RRD graphs and other features.  Also has an API.
>>>> There are plugins available to send metrics to other engines, e.g.
>> graphite and friends.
>>>> Craig
>>>> 
>>>> --
>>>> This message was sent to: dbrain at gmail.com <dbrain at gmail.com>
>>>> To unsubscribe, send a blank message to trilug-leave at trilug.org from
>> that address.
>>>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>>>> Unsubscribe or edit options on the web  :
>> http://www.trilug.org/mailman/options/trilug/dbrain%40gmail.com
>>>> Welcome to TriLUG: http://trilug.org/welcome
>>> --
>>> This message was sent to: M. Pusateri <mpusateri at wickedtrails.com>
>>> To unsubscribe, send a blank message to trilug-leave at trilug.org from
>> that address.
>>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>>> Unsubscribe or edit options on the web        :
>> http://www.trilug.org/mailman/options/trilug/mpusateri%40wickedtrails.com
>>> Welcome to TriLUG: http://trilug.org/welcome
>> 
>> 
>> --
>> This message was sent to: Aaron S. Joyner <aaron at joyner.ws>
>> To unsubscribe, send a blank message to trilug-leave at trilug.org from that
>> address.
>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>> Unsubscribe or edit options on the web  : http://www.trilug.org/mailman/
>> options/trilug/aaron%40joyner.ws
>> Welcome to TriLUG: http://trilug.org/welcome
> -- 
> This message was sent to: Chris <oogs at umich.edu>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web    : http://www.trilug.org/mailman/options/trilug/oogs%40umich.edu
> Welcome to TriLUG: http://trilug.org/welcome


More information about the TriLUG mailing list