Server Troubles

On May 6, 2007, /var on talon.trilug.org (web, mail, lists) began exhibiting failure consistent behavior. Kevin Otte presented the following error info to the sysadmin subcommittee:

find: /var/lib/mailman/archives/private/trilug/Week-of-Mon-20030811/019424.html:
Input/output error
hdc: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdc: read_intr: error=0x40 { UncorrectableError }, LBAsect=15750559,
sector=15750496
end_request: I/O error, dev 16:01 (hdc), sector 15750496
I/O error in filesystem ("ide1(22,1)") meta-data dev ide1(22,1) block
0xf05560 ("xfs_trans_read_buf") error 5 buf count 8192

With pilot, the new trilug server in place, but only partially configured, we found ourselves in the unique situation of having plenty of resources, but not much personal bandwidth to immediately fix the problem. I asked Kevin to perform a backup of /var pending further action.

Early on the morning of May 7, 2007, Tanner Lovelace presented a report of what was set up on pilot, and we decided to move talon's contents to pilot, rather
than putting some "temporary" solution in place to hold talon together.

This work is generally complete, but there are probably a few adjustments to be made, and some DNS info is still propagating. Additionally, it is not yet know whether mail sent to talon between the initial failure and the resolution is available or whether it was lost. If it is lost, please accept my apology in advance.

My thanks to the following people for their technical work and advisory:

Tanner Lovelace
Kevin Otte
Jeremy Portzer
Cristobal Palmer

We encourage everyone to read "How to Report Bugs Effectively" and report
any weirdness via the TriLUG Contact Form

Matt Frye
TriLUG SysAdmin