[TriLUG] System overload issues
Max TenEyck Woodbury
max at mtew.isa-geek.net
Fri May 24 10:25:01 EDT 2013
On 05/24/2013 09:56 AM, Bill Farrow wrote:
> On Fri, May 24, 2013 at 9:34 AM, Brian McCullough <bdmc at buadh-brath.com> wrote:
>> Frequently during the day, the system will become ( or the web sites will become )
>> non-responsive for periods ranging from one minute to well over an hour.
>
> Have you thought about putting limits on processes to prevent them
> from taking the system to it's knees ? I would start by looking at
> ulimit. If you can prevent the system from becoming un-responsive,
> then you can start investigating which process is going haywire and
> hopefully fix it properly.
>
>> Now, other things seem to be showing failure symtoms; for instance, bzip2, which
>> compresses the MySQL database backup seems to take hours instead of minutes;
>
> How big is the mysql dump file that is being compressed ? Time how
> long it takes when the system is running normally, and compare with
> when the system is under load.
>
> time bzip2 test-backup
>
>
> I'm going to second Ron Kelley's suggestion that it might be a bad
> hard drive. Check dmesg and syslog for hard drive error messages. I
> had this happen on a RAID1 (mirror) system at work: it would normally
> run fine but grind to a snail pace when it happened to read a bad
> block on one of the drives. I was disappointed that Linux software
> RAID1 did not help in this situation.
>
> Bill
>
Try running a temperature monitor on the disks. I have seen cases
where very warm drives simply take forever to process requests without
actually going bad. In particular, I had one drive that ate its
machine's CPU until I put a cooler on it. I still use the drive, but
not for operations that put it under heavy load for any length of time.
More information about the TriLUG
mailing list