[TriLUG] Opteron >4GB w/RHEL 3
Mark T. Voelker
markvoelker at fast-mail.org
Mon Feb 16 09:12:18 EST 2004
I'm working on setting up some new lab hardware, among which is a shiny
new dual Opteron server running RHEL 3. The box has two Opteron 244
CPU's and 6GB of DDR ECC RAM installed in six 1GB sticks (there are 8
total DIMM slots). The motherboard is a Tyan S2882 (a.k.a. Thunder K8S
Pro). Everything seems to run fine in the limited testing I've done so
far, but every few seconds I see this appear in the syslog:
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: CPU 0: Silent Northbridge MCE
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: Northbridge status
a40000000005001b
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: GART TLB error generic level
generic
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: extended error gart error
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: link number 0
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: error address valid
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: error uncorrected
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: previous error lost
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: error address 00000000fafe1a68
I thought this looked like possibly bad RAM. But when I pull out
two--*any* two--sticks of RAM, the error message goes away. It seems to
be tied in to the fact that I have >4GB of memory. According to the
motherboard manual, when you use more than 6 DIMMs on this board, you're
using a 128-bit (interleaved) memory configuration as opposed to a
64-bit (noninterleaved) configuration with 4 or fewer DIMMs (ref. page
30 of ftp://ftp.tyan.com/manuals/m_s2882_101.pdf), if that's any hint.
I've tried rearranging the DIMMs in every valid way listed in the
motherboard's manual to no avail. I even ran memtest86
(www.memtest.org) just to be sure I didn't have bad RAM. I'm using RHEL
stock kernel 2.4.21-9.ELsmp and had the same problem on 2.4.21-4.ELsmp.
The box seems to run fine, but those errors clogging up my syslog have
me worried.
Anyone know what might be happening here? I'm not sure whether to
complain to the vendor that something is fishy with their hardware or
whether this is a software issue.
At Your Service,
--
Mark T. Voelker
[root at localhost root]# free
total used free shared buffers
cached
Mem: 5976880 657272 5319608 0 105080
222268
-/+ buffers/cache: 329924 5646956
Swap: 2040244 0 2040244
[root at localhost root]# uname -a
Linux localhost.localdomain 2.4.21-9.ELsmp #1 SMP Thu Feb 12 16:03:39
EST 2004 x86_64 x86_64 x86_64 GNU/Linux
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.trilug.org/pipermail/trilug/attachments/20040216/248b0610/attachment.pgp>
More information about the TriLUG
mailing list