[TriLUG] Remote server monitoring

Aaron Joyner aaron at joyner.ws
Thu Sep 1 11:41:29 EDT 2005


Shane O'Donnell wrote:

>Or use OpenNMS, which already does all of this out of the box.
>
>Oh yeah, almost forgot, it also scales better, has a better user
>interface (from a usability perspective), and can use all of the
>plugins built for Nagios.
>  
>

Did you mention that it's auto detection (perhaps it's only feature 
advantage over Nagios) is notoriously prone to slaughter the network 
(1), and it's implemented entirely in Java (and consumes resources like 
your average java application, accordingly?)  Also, don't forget to 
point out that it doesn't understand network topology, and as such will 
page you for services Y and Z that depend on X, when ever X goes down, 
because you can't express the dependencies.  I'd definitely debate your 
usability interface point, as with Nagios' increased understanding of 
network topologies, it's able to graph the network  in a very clear and 
understandable manner, which makes blocking outages very fast to 
understand, and reduces troubleshooting time accordingly.  Not to 
mention that it is in-and-of-itself a self-documenting diagram of your 
network, which makes a great teaching aide for new folks on staff, as 
well as the fact it's the most likely document to be up to date in your 
entire documentation (as it's auto generated from the monitoring system).

Have a mentioned enough, or should we continue the holy war?  :)  I can 
go on for pages...
Aaron S. Joyner

1 - In all fairness, I've never setup OpenNMS.  I've only seen it in 
use, on networks I've managed.  I've seen it setup by people I don't 
know, people I know are idiots, and one person who I think is rather 
knowledgeable.  In all 3 cases I've seen it's auto detection utterly 
choke OpenSSH's ability to take new incoming connections, by effectively 
flooding the hosts with connections until the host runs out of 
resources.  I'd rather not have a "monitoring" daemon that has more than 
a few times been the source of the problem it's complaining about.  I'll 
gladly admit the remote possibility that this may have been the fault of 
3 separate and unrelated sets of people misconfiguring it in the same 
way on completely separate occasions.  But if that's the case, that's a 
design flaw in and of itself.  My solution to the problem usually goes 
something like this:
ps ax | grep java | awk '{print $1}' | xargs kill
but then again I've been called "closed minded" when it comes to java, 
and I generally consider it a complement.



More information about the TriLUG mailing list