[TriLUG] debugging random reboot
Phillip Rhodes
mindcrime at cpphacker.co.uk
Mon Dec 4 22:16:43 EST 2006
One of my linux boxes has recently taken to rebooting itself at random
intervals, and I'm at my wits end trying to figure out why. I'm hoping
somebody here might have some suggestions. Here's what I know / have done
so far:
1. I've been in the house (albeit different room) when it rebooted
and there was no power event. Additionally none of my other
boxes are rebooting. I think it's safe to eliminate power events
even though the box isn't on a UPS.
2. Installed Memtest86 and booted into that. Ran for about
9 hours and found no memory errors. To my mind that
eliminates two possibilities: memory and power-supply; as the
box doesn't stay up 9 hours when it boots into the OS.
3. checked CPU temperature and fan speed, all look to be normal.
4. checked hard-drive with the short offline test using smartctl, found
no problems.
5. Suspected somebody rebooting me remotely using
some apache exploit, so I shut down port 80 traffic to the
box, which did not help.
6. Examined /var/log/messages, the httpd logs, jboss logs, etc.
found nothing that looked unusual.
At this point I suspect a hardware problem, but I'm not sure
what to try next. I think memory and power-supply are out
as possibilities, which leaves the CPU or the motherboard as
the most likely culprits. Other than swapping either or both
out for a different part, I don't really know any way to test
that theory... any suggestions?
The other possibility might be a rootkit of some sort (this box
is exposed to the public Internet, so anything's possible I guess).
If it matters, the box is running Centos 4.2, uname -a reports:
Linux mariner 2.6.9-22.0.1.EL #1 Thu Oct 27 12:26:11 CDT 2005 i686 i686
i386 GNU/Linux
TTYL,
Phil
More information about the TriLUG
mailing list