[TriLUG] copying files

David Both dboth at millennium-technology.com
Wed Jun 20 06:09:29 EDT 2012


The nice thing about rsync is that it ALWAYS ONLY sends the differences. It 
never sends the whole file if only a few bytes have changed, just the diff. This 
is one of the reasons it is so very efficient.


On 06/19/2012 11:41 PM, Sean Korb wrote:
> On Tue, Jun 19, 2012 at 10:15 PM, Joseph Mack NA3T <jmack at wm7d.net> wrote:
>> On Tue, 19 Jun 2012, Jeff Schornick wrote:
>>
>>> On Tue, Jun 19, 2012 at 9:39 PM, Joseph Mack NA3T <jmack at wm7d.net> wrote:
>>>> I haven't used rsync. So after the initial phase, both ends know the
>>>> files
>>>> at each end and when I add a new file at one end, rsync will notice and
>>>> just
>>>> handle it?
>>>
>>> Not quite.
>>>
>>> On each synchronization run, rsync creates a local list from the
>>> source directory, while simultaneously creating the analogous list on
>>> the remote end.  This means if you have 1000 files, you may be looking
>>> at 1000 fstats on each end.  However, these checks are both done
>>> locally on the corresponding machines.  As long as the target system's
>>> local file I/O isn't significantly slower than the source machine's,
>>> you shouldn't be introducing any additional delay.
>>>
>>> After both lists have been generated, rsync uses a minimal amount of
>>> network traffic to compare the lists and generate a final list of
>>> which files need to be updated.  As expected, only those files are
>>> sent over the network.
>>>
>>> After the synchronization is complete, the generated lists get tossed
>>> out as dirty laundry.  There is no long running daemon which attempts
>>> to keep them up-to-date in realtime.  However, I imagine someone has
>>> created a slick piece of code using inotify to do just that.
>>
>> OK, so I'd have to invoke rsync every 5 mins. Assembling the list of files
>> at each end has to be done anyhow (eg find). Presumably 1000 fstats take the
>> same time no matter whether find or rsync then processes the list. The
>> problem then is comparing the lists at each end.
>>
>> cp -auv is really slow
>>
>> rsync you say is fast (and I believe you).
>>
>> but I already have my list from `find`, so there's no extra cost if I use
>> find.
>>
>> The copy of the files takes the same time no matter which way I assembled
>> the list of files to be copied.
>>
>> So `find` followed by `cp --parents` or `cpio` seems to be it.
>>
>> Alan points out the resilience of rsync. This is a good feature, but as it
>> turns out (and I didn't say this), I don't mind loosing an occassional file,
>> but throughput is high priority. The backup machine is writing files from
>> many sources and it only has a few seconds to service a source machine, or
>> it will fall over with the load.
> Use both?  rsync is pretty darned efficient even used atomically.
>
> find . -mtime -5 -type f -print0 -exec rsync  -at {} /nfsdir/ \; or
> some crazy mess with xargs would be the proper way to do it.
>
> I think I have something buried somewhere that kind of does this...
> uses rsync to ship all the *differences* between two volumes to a
> third volume cutting down on space used for shipping hard drives of
> data back and forth using a FedEx truck.  I haven't used it in years
> so I'll have to do some digging.
>
> sean
>



-- 


*********************************************************
David P. Both, RHCE
Millennium Technology Consulting LLC
919-389-8678

dboth at millennium-technology.com

www.millennium-technology.com
www.databook.bz - Home of the DataBook for Linux
DataBook is a Registered Trademark of David Both




More information about the TriLUG mailing list