[TriLUG] Efficiency of rsync snapshot backups with 'rsback'
Jeremy Portzer
jeremyp at pobox.com
Mon Nov 8 23:12:32 EST 2004
Several months ago, I presented on rsync snapshot backups. I have a
few reflections on the efficiency of the "rsback" program that
implements this nifty idea.
As a review, rsync snapshot backups are an innovative way to produce
incremental backups, saved to disk on a backup server. Instead of
performing a full backup, then incrementals, then another full backup, the
rsync snapshot backup effectively performs full backups *every* time you
run it... getting a "snapshot" of the system. However, using clever
tricks with hard links ("man ln"), you save space as only the changed
files need to be saved again.[1]
I demonstrated a system using a helper script called "rsback" (google-it).
We are running this script on the trilug server to perform backups of the
home directories and mail spools six times per day (every four hours).
Today, I was looking at the disk space used by the backups, and noticed
the following:
Amount of disk space used by home directory backups: 19868 MB
Amount of disk space that would be used without hard links (measured with
"du -mls", the "-l" option counts each link multiple times): 269962 MB
So, the "compression" ratio is 13.6 ... the rsync system stores 13 times
as much as would otherwise be possible without this incremental backup
system!
Now, for the imap directories (mail spools) :
Amount of disk space used (du -ms) : 25636 MB
Amount of space that would be used (du -mls) : 45015 MB
In this case, the "compression" ratio is just 1.76. The reason why is
because the mail spools are almost all files that are ever-increasing. A
new copy of the whole file has to be stored when the file has changed
since the last backup. Since the mail files change frequently, this
results in a lot of cases where the hard links cannot be used, and there
is not nearly as much space savings.
The latter problem could be solved one of two ways: 1) switch to a mail
system that doesn't use large spool files, such as Maildir, or 2) switch
to a program like "rdiff-backup" that can store the differences in files,
rather than having to copy a whole file at a time. The disadvantage of 1)
is that it's a lot of work to switch mail systems, and we like the UW
IMAP (in Blackbox mode) as configured. The disadvantage of 2) is that
systems like rdiff-backup require special tools to recover the backed up
files. Our rsback system requires simply "cp" to restore!
Also note that the problem with large files that are ever-increasing can
be an issue with log files as well. A solution is to rotate the logs
frequently and use a rotation scheme that names past log files in terms of
dates (rather than messages.1, messages.2, etc). You don't want the
filenames of past logs to be changing, otherwise the backup system will
have to back them up again. Of course, if your web site doesn't get very
many hits, and you don't have too many messages in /var/log/messages, you
won't have any big problems with just leaving the rotation scheme as-is.
:-)
So, the rsback system isn't without its drawbacks, but overall it's a
great way to set up a disk-based backup server. It's a "set and forget"
system that doesn't rely on people moving tapes around, and it's
braindead-easy to restore files. If you don't have a comprehensive backup
system, I highly recommend you consider a disk-based backup server and
rsback or one of the other rsync snapshot backup systems.[2]
Recommended reading for more info:
http://www.mikerubel.org/computers/rsync_snapshots/
This article is also published in the O'Reilly book, _Linux Server Hacks_.
Hope this helps,
Jeremy
[1] Use in conjunction with LVM snapshots to get a fully consistent backup
that can be used for a hassle-free "bare metal" restore.
[2] Note that for disaster recovery purposes, your backup server should be
located off-site. If this is impractical due to bandwidth considerations,
consider supplementing this with a tape drive that just copies the backup
disk(s). The tapes should then be taken off-site.
--
/---------------------------------------------------------------------\
| Jeremy Portzer jeremyp at pobox.com trilug.org/~jeremy |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F E135 6F9F F7BC CC1A 7B92 |
\---------------------------------------------------------------------/
More information about the TriLUG
mailing list