[TriLUG] Replicating a filesystem across datacenters

Brian Cottingham spiffytech at gmail.com
Tue Nov 4 14:37:30 EST 2014


My employer is expanding from just one datacenter, adding a second as a
failover location, and we're unsure how to keep our file store in sync.

Our website generates tons of files, and stores them to / serves them from
an NFS share that all of our web servers mount. This has worked
satisfactorily inside a single datacenter.

We originally planned to periodically rsync the data between the
datacenters, but rsync takes as much as 24 hours just to compare the local
and remote filesystems for differences. That's too long for rapid
fail-over/fail-back, involves lots of disk thrashing, etc., so we're
looking for other options.

We've considered just moving everything to Amazon S3, but that would
involve a substantial change to our application code, and we'd prefer
something less invasive on the app side if possible.

We need a solution that allows writes to occur in both datacenters, and
which will bring the datacenters back in sync after the failed one comes
back online. (In the case of conflicts, we can assume last write wins).
This sounds like it nixes solutions as simple as shipping LVM snapshots.

Something like a distributed filesystem or a sync daemon that watches
inotify, could be good, but I'm not familiar with which options are good
for mission-critical operations.

Does anyone have recommendations for how we could tackle this problem?


More information about the TriLUG mailing list