[TriLUG] Clusters, performance, etc...

Michael Alan Dorman mdorman at debian.org
Mon Nov 7 15:25:40 EST 2005


Mark Freeze <mfreeze at gmail.com> writes:

> I have a friend who runs a business like mine and we have the same
> basic setup. We normally receive files from customers that may be 50
> to 100 MB. We run programs on these files that parse text, create
> databases, purge records, and so on. Normal database
> stuff. Converting and parsing records with the software that I have
> written usually runs for about 1 hour on the larger files and we may
> have 2 or 3 of these files each time a customer trasmits data to us.

You haven't given enough information to even make a good guess.  To
make a good assessment, you would need to know if:

 1. Are there dependencies between those files---that is, must you
    process A.txt before B.txt before C.txt?

 2. Is there some shared resource that would be required by all
    systems doing processing---that is, would all the data from all
    three have to be stored in a single database, or is the data for
    each totally independent?

> My friend says that he is considering clustering Linux boxes
> together to improve the speed of the processing and he figures that
> he can cut processing time in half. Now I may be in for a public
> spanking, but I did not think that clustering would have that much
> of an effect on this type of operation.

It could.  It very much depends on the nature of the job(s).

If your jobs are loosely coupled---that is, they don't have
dependencies and they don't make demands on the same resources at the
same time---then throwing more machines at the process could scale
very well.

Now how well it might scale is going to depend on what your current
bottleneck is, etc.

> Also, he is not talking about clustering new, workhorse p4
> machines... He is talking about clustering up about 4 or 5 p3 & p4
> machines that he has as spares. From the things that I have read
> (including the link that someone posted the other day) I think that
> he has a misconception of clustering.
> Am I way off base? Will clustering have this dramatic of an effect?

Without more information, it's impossible to say.

Mike
-- 
The piano is firewood, Times Square is a dream -- Tom Waits



More information about the TriLUG mailing list