[TriLUG] Remote server monitoring
Aaron Joyner
aaron at joyner.ws
Thu Sep 1 11:41:29 EDT 2005
Shane O'Donnell wrote:
>Or use OpenNMS, which already does all of this out of the box.
>
>Oh yeah, almost forgot, it also scales better, has a better user
>interface (from a usability perspective), and can use all of the
>plugins built for Nagios.
>
>
Did you mention that it's auto detection (perhaps it's only feature
advantage over Nagios) is notoriously prone to slaughter the network
(1), and it's implemented entirely in Java (and consumes resources like
your average java application, accordingly?) Also, don't forget to
point out that it doesn't understand network topology, and as such will
page you for services Y and Z that depend on X, when ever X goes down,
because you can't express the dependencies. I'd definitely debate your
usability interface point, as with Nagios' increased understanding of
network topologies, it's able to graph the network in a very clear and
understandable manner, which makes blocking outages very fast to
understand, and reduces troubleshooting time accordingly. Not to
mention that it is in-and-of-itself a self-documenting diagram of your
network, which makes a great teaching aide for new folks on staff, as
well as the fact it's the most likely document to be up to date in your
entire documentation (as it's auto generated from the monitoring system).
Have a mentioned enough, or should we continue the holy war? :) I can
go on for pages...
Aaron S. Joyner
1 - In all fairness, I've never setup OpenNMS. I've only seen it in
use, on networks I've managed. I've seen it setup by people I don't
know, people I know are idiots, and one person who I think is rather
knowledgeable. In all 3 cases I've seen it's auto detection utterly
choke OpenSSH's ability to take new incoming connections, by effectively
flooding the hosts with connections until the host runs out of
resources. I'd rather not have a "monitoring" daemon that has more than
a few times been the source of the problem it's complaining about. I'll
gladly admit the remote possibility that this may have been the fault of
3 separate and unrelated sets of people misconfiguring it in the same
way on completely separate occasions. But if that's the case, that's a
design flaw in and of itself. My solution to the problem usually goes
something like this:
ps ax | grep java | awk '{print $1}' | xargs kill
but then again I've been called "closed minded" when it comes to java,
and I generally consider it a complement.
More information about the TriLUG
mailing list