[TriLUG] clustering or server mirroring
Aaron S. Joyner
aaron at joyner.ws
Wed Apr 20 07:23:19 EDT 2005
Magnus Hedemark wrote:
> David McDowell wrote:
>
>> One in the same? Here's my idea. I'd like to use CentOS 4 if
>> possible to do this. I would like to have my webserver mirrored on
>> another machine so that if one goes down, the site continues to run.
>> If I change a config on one machine, the config should change on the
>> mirrored machine. Is this running a cluster or is this some other
>> kind of setup? Basically I have some time at work to play. Any good
>> resources for this kind of information? Basically I want 2 servers to
>> be identical mirrors of one another so that if one of the 2 goes down,
>> I'm still online. And, if I repair the broken one, it can resync
>> itself so that the mirror of the 2 machines is identical again.
>> Suggestions, links, etc?
>>
>
> Since CentOS 4 is, despite what RHAT's lawyers say, technically pretty
> much RHEL 4, you can follow the admin docs for RHEL 4 to see how
> clustering works in there. The way we're doing it at $WORK requires
> access to a LUN on a SAN that is unmasked to both servers, though, so
> I'm not quite sure how you would pull it off without some sort of
> shared external SCSI or SAN storage.
I wanted to comment on this thread yesterday, but time did not allow.
As I've been looking into this a lot lately, I'll quickly try to
summarize the available options for "clustering", as I've discovered
thus far. Primarily I'm familiar with high availability clustering, as
opposed to compute-performance clustering. The latter is more geared
towards parallel processing of individual data chunks, and sharing those
chunks to be processed over something like Myrinet or Infiniband, or
even Gig-E. This requires dedicated software which understands how to
divide up the processing task, and isn't really what you're after.
Having gotten that out of the way, on to HA clustering...
For clustering web traffic, ultramonkey has been mentioned already, and
is a very nice bundling of heartbeat and the other tools from the
linux-ha project, with LVS (Linux Virtual Server). I have seen it used
with some success in production, for doing pretty-much what you
describe. If you need a group of web servers to appear as "one" web
server which never goes down, Ultramonkey is a nice bundling of those
tools. It essentially allows you to take a pair of very simple,
low-powered machines and make a failover-pair out of them. Then on that
failover pair you run LVS, which talks to your webserver "farm" behind
the failover pair. This provides a very redundant system, in that
either of the front end machines can fail, or any of the back end
machines can fail, and traffic will continue to be served accordingly.
This doesn't help you solve the problem of dividing up your application,
if it's not currently capable of being run on more than one server...
Consider something other than web services. Something like Cyrus IMAP,
for example, is notoriously difficult to setup for high availability.
The traditional way to handle the problem is to use one big box, make it
as redundant as possible, and hope it doesn't go down. Anything more
complicated involved a "murder" of IMAP servers (I'm not making that
name up), with a specially designed redirector, and a lot more
machines. Essentially, this becomes a configuration and management
nightmare, unless you *really* need to scale well beyond what one box
can serve up anyway. With Cyrus, you can't even share the backed
storage over NFS, because it's locking doesn't play well with NFS. So,
how do you provide good email services? Enter DRBD. DRBD is short for
Distributed Redundant Block Device, and it's exactly what it sounds
like. A method for creating a block device in Linux, and then mirroring
all changes to that block device to a redundant block device on another
computer over the network (usually over a dedicated Gig-E link). You
then format this block device with your journaling filesystem of choice
(after all, a disk is just a block device to linux). You then combine
the afore-mentioned linux-ha project (with heartbeat and IP failover) in
such a fashion that when the first machine fails, the second machine
Shoots The Other Node In The Head (STONITH) to ensure that it's off, and
then mounts the redundant copy of the filesystem, starts the services,
takes over the IP address, and starts serving up traffic. All with in
about 10-15 seconds. With this way of doing things, you can serve up
redundant NFS, redundant Cyrus, redundant Jabber, or what ever you
need. Even basic clustering of an Apache server works well in this
case, if all you need is failover with out load balancing. It also
somewhat reduces the complexity of the picture, as you don't have LVS
involved monkeying (no pun intended) with the packets (which is required
for scaling much beyond two machines).
There is of course the classic way of clustering, involving external
shared storage. It's very similar to the DRBD description above (in
fact the story should be told in reverse, as DRBD is based off this
idea), where you have an external source of shared storage, such as a
SAN fabric, or a shared SCSI disc subsystem. Instead of having a shared
Gigabit link to mirror the data back and forth, it's simply stored on
media which both machines have access to. The obvious problem here
being that Brocade switches and fiberchannel disc controllers / disc
boxes aren't cheap equipment. :) Even external SCSI arrays are usually
a bit of overkill for the task at hand, not to mention equally expensive.
Anyway, hopefully this has been a nice summary of the clustering options
that are available, and what purposes they are best for. By all means,
I'm not authoritative (or even timely in this case), but maybe it's a
useful starting point for someone.
Aaron S. Joyner
More information about the TriLUG
mailing list