[TriLUG] Diagnosing RCU stall warnings
Brian Henning via TriLUG
trilug at trilug.org
Mon Dec 7 12:27:58 EST 2015
Hi folks,
We had a server become unresponsive recently. Symptoms included still accepting TCP connections but the underlying services never responding, and a long series of "self-detected stall" messages on the console.
I found this link:
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
which talks about what a stall warning means and some typical causes. It refers to examining stack traces to find the offender, but I don't know where to find said stack traces (after power-cycling the machine). I looked in a bunch of files in /var/log with no useful results.
The kernel running (4.1.9) is a much more recent version than what the installed Debian distribution came with (2.6.32), due to the need for some newer features. Could some outdated system utility be causing problems against the newer kernel? We've had one or two kernel panics on the machine recently as well, but I don't have records of the cause(s). Should I just rebuild the OS?
Thanks,
-Brian
More information about the TriLUG
mailing list