[TriLUG] DMA interrupt recovery

Marty Ferguson marty at rtmx.net
Wed Feb 11 14:56:39 EST 2004


Good luck, Douglas


I've seen this on my "Wally"  system
(AKA Walter, Walton, or Waldo, depending on which OS I bring up...
You'll never guess where I bought this hunk of iron.
The power supply failed within the 1st 6 months...)

I've added a couple of EIDE drives to Wally, and I've seen this same
problem come and go.  It's gone for now.  My recent attempt at installing
Fedora failed the first time thru while creating the ext3 filesystems,
which is well into the install.  I had not seen the "lost interrupt"
for a couple of months at that time.

So I don't know what the source is, but when I have encountered it in the
past on Wally, I have reseated my drive cables.  For the fedora install,
there was nothing on the disk I cared about, so I instututed the
/dev/slash/burn policy on the HD partition table thru the vtty3 console
early in the 2nd (and final) Fedora install.  Everything ended up just
peachy.

So my suspicion is that at some critical period early in the life of
the drive, there was some mildy flaky cable problem, and combined with
a bad bit or CRC code or some other firmware "FM" somewhere on the
drive/drive
electronics/firmware, that this problem rears it's ugly head on occasion
(i.e., due to temp, humidity, stray quarks, phase of moon  ;-)

This isn't to say that the described symptoms do not generally indicate a
hard drive failure, but if you are seeing this problem with a frequency
that is not consistent with 5 to 10,000 MTBF, then the root cause is
probably
not hard drive failures.  (Presuming, of course, you observe basically
sound ESD precautions and don't go around willy-nilly blasting
semiconductors
with unseen static electricity)

Summary - just saw your last posting...
Yes, buy top grade new cables and see if it goes away.  If it does, great!
If a differet intermittent problem then recurs,  then you found the root
cause,
but you'll need back up your filesystem (tar, not dd) recreate the drive,
and
then restore.

My 2 bits
Marty

=========================
PS I'm really not kidding about the ESD.  I once worked with a
human static generator we nicknamed "sparky"
=========================



-----Original Message-----
From: trilug-bounces at trilug.org [mailto:trilug-bounces at trilug.org]On
Behalf Of Douglas Kojetin
Sent: Wednesday, February 11, 2004 11:31 AM
To: Triangle Linux Users Group discussion list
Subject: Re: [TriLUG] DMA interrupt recovery


oye!  this'll be my second in a month or so (which is significant for
the relatively few numbers of computers we have vs. the # of times i've
seen this happen before -- at least to me!).  i just ran WD diag tools
quick test -- it passed.  but, i'm going to rsync the data to another
computer (hopefully) and run the extended test to be sure next.


On Feb 11, 2004, at 11:14 AM, Jason Tower wrote:

> probably a failed hard drive, i've seen this on at least four drives in
> the last six months.  you might try turning off DMA with 'hdparm -d0 /
> dev/hda' but then performance will be abyssimally slow, if it works at
> all.

--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc




More information about the TriLUG mailing list