[TriLUG] clustering or server mirroring
Aaron S. Joyner
aaron at joyner.ws
Wed Apr 20 16:55:14 EDT 2005
Mike Johnson wrote:
> Um, wow. You have to do all that [DRBD] to fail-over Cyrus? Ick.
> This is why maildir is so nice. Between IMAP/POP and SMTP, it's
> actually why maildir was created. Keep your spools on an NFS system
> and you can have multiple IMAP servers with simply an IP level load
> balancer and you're set. One of the IMAP servers dies? No big deal.
> The same can be said/done with SMTP. Both can easily scale to
> multiple systems. This relies on a reliable NFS system, but those
> aren't too expensive.
Well keep in mind that this buys you more than just fail over of Cyrus.
It also is providing that "reliable NFS" system you describe, in that
the data is all safely mirrored between two ultimately redundant
computers (which may also provide their own redundancy against hardware
failure). The maildir (usually read: qmail) setup you describe above
only works in a situation where you have 3 servers or more. That's
usually not a problem, but it just pushes the "redundant single data
store" problem farther back in the mail system. Something still has to
provide a single, redundant copy of the data. It could very well be
DRBD serving up NFS from the 3rd (and now 4th) machine in your picture.
:) Although at that point, unless load is a concern at the qmail level,
you might as well integrate those 4 into a simple pair.
> On DRBD, what happens if the gigabit link between the systems fails?
> Does it scrag your filesystem?
Nope, though the file systems will most likely go out of sync, depending
on the circumstances. If you have an additional path to monitor fail
over (a null modem serial cable between the boxes is highly recommended,
as well as monitoring on the front-end Ethernet interfaces), then the
secondary will realize that only the gig-e link has failed. It will
receive no further updates of the file system until you repair this
link. Once the link is repaired, there is a "fast" checksum for
restoring sync between the two boxes, so that you don't have to copy
over the entire block device to resynchronize them.
In the case that the gig-e link fails, and that's you're *only* way of
knowing that the other system is up, the secondary node would shoot the
primary in the head, mount up it's copy of the block device, and
continue on with life. Now with out STONITH (i.e. a way to remotely
power-off the other machine), you're possibly in for some trouble...
you'd end up with a split-brain scenario, but that has only happened
because you've got a seriously poorly configured HA setup. :)
So in short, it's not quite as bad as you've made it out, Mike.
Although, I'll be glad to concede that a mail system in general is not
the most convenient thing to make redundant with just two boxen.
More information about the TriLUG
mailing list