[TriLUG] Data manipulation over Samba

Andrew Perrin clists at perrin.socsci.unc.edu
Tue May 22 08:50:25 EDT 2007


Jim,

Not knowing exactly what your program is doing, it's hard to know how to 
speed it up. But the answer to your most basic question is, yes, if you're 
copying from one machine to another using a process on a third, the file 
is generally read over the link between the source machine and the 
mediating machine, then written over the link between the mediating macine 
and the destination machine.

I can think of a few strategies you could use to speed this up:

- Use ssh to start a process on the source machine to do the copy to the 
destination machine so you bypass the "middleman"
- Use something better tuned than samba to do the mount (assuming you 
trust the network you're on, plain NFS might be fastest)
- Invest in gigabit ethernet links for the three machines
- Put a second network card in the mediating machine, thereby giving you 
full throughput between source and destination

Hope this helps-
Andy

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu
Assistant Professor of Sociology; Book Review Editor, _Social Forces_
University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA
New Book: http://www.press.uchicago.edu/cgi-bin/hfs.cgi/00/178592.ctl



On Tue, 22 May 2007, Jim Tuttle wrote:

> So, this has been bothering me. I'm hoping someone has an answer and,
> perhaps, a reference.
>
> Ok, I'm running some python data processing scripts against an
> orthophoto collection residing on a disk array in the basement.  There
> are about 4,300 images each about 76MB.  There are several smaller files
> with each image.  Part of the processing includes copying each file to
> another partition on our 14TB ATABeast.  The question is this: Is any of
> this data moving over the network to my machine?
>
> The processing is taking forever.  215 images in 8 hours.  I wondered if
> the images are being read into memory by my machine then written to the
> other partition on the array.  I have this fantasy that python tells the
> processor on the disk array to do the copying, but I imagine that isn't
> true.  To make matters worse, there are several connections through
> which this data traverses.  The array is mounted via fiber channel to a
> Solaris cluster which offers it to a linux machine in the cube next to
> me via NFS and I'm mounting that via samba on my desktop.
>
> I could have and probably should have run this on the intermediate
> machine, but wasn't thinking last night.  Neither the ATABeast nor the
> Solaris cluster have python installed and that's a non-starter.
>
> Thanks,
> Jim
> -- 
> --
> ---Jim Tuttle
> ------------------------------------------------------
> url: http://www.prairienet.org/~jtuttle/
> PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x69B69B08
>
> -- 
> TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ  : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
>



More information about the TriLUG mailing list