[TriLUG] Replicating a filesystem across datacenters

Tue Nov 4 15:57:46 EST 2014

Ah, of course! I should have recognized this as a full-on CAP problem.

Your final option, read-only hot standby, is probably not acceptable for us
(though we'll take it if no other solutions present themselves. That brings
us back to the original "rsync is too slow" problem.).

Since my employer is targeting high availability, we can rule out #1.1
(CP). I'm thinking #2 (AP, with eventual consistency) is probably best for
us.

What systems implement this? I know it's at least possible, since I've seen
Bittorrent Sync do it, though my impression is that may not be designed
with the robustness guarantees we need.

On Tue, Nov 4, 2014 at 3:11 PM, Igor Partola <igor at igorpartola.com> wrote:

> It's not even an unsolved problem, it's pretty much proven unsolvable
> according to the CAP theorem. Really it comes down to this:
>
> 1. You have a synchronous system. Writing to datacenter A means that A then
> pushes the write to B, B does fsync() and acknowledges to A that it
> happened; then A tells the client that the data is in fact stored.
>
> In this case, if A cannot talk to B, but A and B are both writing data, you
> have conflicts. When they come back online, you can't figure out how to
> resolve them (think updating different parts of the same file at the same
> time. No timestamps here.) Alternatively, you can create a rule that you
> cannot write anything if A cannot talk to B (and vice versa). Then you
> don't really high availability, though you are guaranteed consistency.
>
> 2. You have an async system. A write to A happens only at A, and is synced
> to B some time later. You have to constantly deal with conflicts here, but
> this system is faster than #1. It's also simpler for those creating it.
>
> Here you have all the same problems of system #1, except since you are
> already forced to deal with conflicts without any failures in networking,
> you likely already have a strategy for dealing with this.
>
> This all means that basically you need to figure out which type of system
> you want first. Is data integrity the most important to you? If so, choose
> system #1 in a mode where if the link between A and B is broken (a
> partition happens), the system is offline.
>
> If high availability is the most important, choose system #1 in a mode
> where you resolve conflicts after the partition is over, or the downed data
> center comes back online.
>
> If high availability and high speed is most important, choose system #2.
>
> An alternative to all of this is where you only use one datacenter as the
> read-write one, and the other as a read-only one, and allow hot stand-by.
> This typically is the better choice, though depending on your setup a loss
> of a few seconds of writes may occur.
>
> Best of luck.
> Igor
> --
> This message was sent to: spiffytech <spiffytech at gmail.com>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that
> address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web  :
> http://www.trilug.org/mailman/options/trilug/spiffytech%40gmail.com
> Welcome to TriLUG: http://trilug.org/welcome
>