[TriLUG] Replicating a filesystem across datacenters

Igor Partola igor at igorpartola.com
Tue Nov 4 15:11:44 EST 2014


It's not even an unsolved problem, it's pretty much proven unsolvable
according to the CAP theorem. Really it comes down to this:

1. You have a synchronous system. Writing to datacenter A means that A then
pushes the write to B, B does fsync() and acknowledges to A that it
happened; then A tells the client that the data is in fact stored.

In this case, if A cannot talk to B, but A and B are both writing data, you
have conflicts. When they come back online, you can't figure out how to
resolve them (think updating different parts of the same file at the same
time. No timestamps here.) Alternatively, you can create a rule that you
cannot write anything if A cannot talk to B (and vice versa). Then you
don't really high availability, though you are guaranteed consistency.

2. You have an async system. A write to A happens only at A, and is synced
to B some time later. You have to constantly deal with conflicts here, but
this system is faster than #1. It's also simpler for those creating it.

Here you have all the same problems of system #1, except since you are
already forced to deal with conflicts without any failures in networking,
you likely already have a strategy for dealing with this.

This all means that basically you need to figure out which type of system
you want first. Is data integrity the most important to you? If so, choose
system #1 in a mode where if the link between A and B is broken (a
partition happens), the system is offline.

If high availability is the most important, choose system #1 in a mode
where you resolve conflicts after the partition is over, or the downed data
center comes back online.

If high availability and high speed is most important, choose system #2.

An alternative to all of this is where you only use one datacenter as the
read-write one, and the other as a read-only one, and allow hot stand-by.
This typically is the better choice, though depending on your setup a loss
of a few seconds of writes may occur.

Best of luck.
Igor


More information about the TriLUG mailing list