[TriLUG] clustering or server mirroring

Wed Apr 20 07:23:19 EDT 2005

Magnus Hedemark wrote:

> David McDowell wrote:
>
>> One in the same?  Here's my idea.  I'd like to use CentOS 4 if
>> possible to do this.  I would like to have my webserver mirrored on
>> another machine so that if one goes down, the site continues to run. 
>> If I change a config on one machine, the config should change on the
>> mirrored machine.  Is this running a cluster or is this some other
>> kind of setup?  Basically I have some time at work to play.  Any good
>> resources for this kind of information?  Basically I want 2 servers to
>> be identical mirrors of one another so that if one of the 2 goes down,
>> I'm still online.  And, if I repair the broken one, it can resync
>> itself so that the mirror of the 2 machines is identical again. 
>> Suggestions, links, etc?
>>
>
> Since CentOS 4 is, despite what RHAT's lawyers say, technically pretty 
> much RHEL 4, you can follow the admin docs for RHEL 4 to see how 
> clustering works in there.  The way we're doing it at $WORK requires 
> access to a LUN on a SAN that is unmasked to both servers, though, so 
> I'm not quite sure how you would pull it off without some sort of 
> shared external SCSI or SAN storage.

I wanted to comment on this thread yesterday, but time did not allow.  
As I've been looking into this a lot lately, I'll quickly try to 
summarize the available options for "clustering", as I've discovered 
thus far.  Primarily I'm familiar with high availability clustering, as 
opposed to compute-performance clustering.  The latter is more geared 
towards parallel processing of individual data chunks, and sharing those 
chunks to be processed over something like Myrinet or Infiniband, or 
even Gig-E.  This requires dedicated software which understands how to 
divide up the processing task, and isn't really what you're after.  
Having gotten that out of the way, on to HA clustering...

For clustering web traffic, ultramonkey has been mentioned already, and 
is a very nice bundling of heartbeat and the other tools from the 
linux-ha project, with LVS (Linux Virtual Server).  I have seen it used 
with some success in production, for doing pretty-much what you 
describe.  If you need a group of web servers to appear as "one" web 
server which never goes down, Ultramonkey is a nice bundling of those 
tools.  It essentially allows you to take a pair of very simple, 
low-powered machines and make a failover-pair out of them.  Then on that 
failover pair you run LVS, which talks to your webserver "farm" behind 
the failover pair.  This provides a very redundant system, in that 
either of the front end machines can fail, or any of the back end 
machines can fail, and traffic will continue to be served accordingly.  
This doesn't help you solve the problem of dividing up your application, 
if it's not currently capable of being run on more than one server...

Consider something other than web services.  Something like Cyrus IMAP, 
for example, is notoriously difficult to setup for high availability.  
The traditional way to handle the problem is to use one big box, make it 
as redundant as possible, and hope it doesn't go down.  Anything more 
complicated involved a "murder" of IMAP servers (I'm not making that 
name up), with a specially designed redirector, and a lot more 
machines.  Essentially, this becomes a configuration and management 
nightmare, unless you *really* need to scale well beyond what one box 
can serve up anyway.  With Cyrus, you can't even share the backed 
storage over NFS, because it's locking doesn't play well with NFS.  So, 
how do you provide good email services?  Enter DRBD.  DRBD is short for 
Distributed Redundant Block Device, and it's exactly what it sounds 
like.  A method for creating a block device in Linux, and then mirroring 
all changes to that block device to a redundant block device on another 
computer over the network (usually over a dedicated Gig-E link).  You 
then format this block device with your journaling filesystem of choice 
(after all, a disk is just a block device to linux).  You then combine 
the afore-mentioned linux-ha project (with heartbeat and IP failover) in 
such a fashion that when the first machine fails, the second machine 
Shoots The Other Node In The Head (STONITH) to ensure that it's off, and 
then mounts the redundant copy of the filesystem, starts the services, 
takes over the IP address, and starts serving up traffic.  All with in 
about 10-15 seconds.  With this way of doing things, you can serve up 
redundant NFS, redundant Cyrus, redundant Jabber, or what ever you 
need.  Even basic clustering of an Apache server works well in this 
case, if all you need is failover with out load balancing.  It also 
somewhat reduces the complexity of the picture, as you don't have LVS 
involved monkeying (no pun intended) with the packets (which is required 
for scaling much beyond two machines).

There is of course the classic way of clustering, involving external 
shared storage.  It's very similar to the DRBD description above (in 
fact the story should be told in reverse, as DRBD is based off this 
idea), where you have an external source of shared storage, such as a 
SAN fabric, or a shared SCSI disc subsystem.  Instead of having a shared 
Gigabit link to mirror the data back and forth, it's simply stored on 
media which both machines have access to.  The obvious problem here 
being that Brocade switches and fiberchannel disc controllers / disc 
boxes aren't cheap equipment.  :)  Even external SCSI arrays are usually 
a bit of overkill for the task at hand, not to mention equally expensive.

Anyway, hopefully this has been a nice summary of the clustering options 
that are available, and what purposes they are best for.  By all means, 
I'm not authoritative (or even timely in this case), but maybe it's a 
useful starting point for someone.

Aaron S. Joyner