[TriLUG] Odd Network problem

Corey Mutter mutterc at nc.rr.com
Wed Feb 16 23:21:41 EST 2005


Chris Knowles wrote:
> Got a weird one.  
> 
> (Oh, regarding that crashing box, further investigation pointed at the 
> motherboard as a culprit.)
> 
> I've got a Nagios server in place that's been happily warning us of doom and 
> gloom for over a year.  It's one of the great success stories for Linux at 
> our company.
> 
> Until now.
> 
> Starting this morning, it has been randomly unable to ping various boxes on 
> our network.  That is, until you ping the nagios server from the "unpingable" 
> server.   Then Nagios can ping that server all it wants. 
> 
> This is all local network, no routing involved.  
> 
> Any idears as to what could be causing this?  (This is a simple switched 
> network, and other than this seems to be working fine.)
> 
> Any help is appreciated.  
> 
> CJK
> 
As Jason mentions, this is an arp problem. The nagios box is either not 
sending arp requests, or not listening to the replies. When another box 
arps for the nagios, it hears that request and replies, at the same time 
populating its cache, so it can send packets to that box then. (I see 
this kind of one-way pingability a lot in my day job of debugging 
switch/routers).

The best bet is to run the all-seeing, all-knowing Ethereal on both the 
nagios box and the 'other' box, or its command-line cousin tcpdump (you 
needn't even put them in promiscuous mode, as you're tracing packets 
destined to the boxes in question). Then you can see what's going wrong 
with those arp packets.

Corey



More information about the TriLUG mailing list