[TriLUG] Clusters, performance, etc...

Mon Nov 7 18:27:39 EST 2005

I'm surprised that there weren't earlier responses that suggest getting 
off of a workstation platform.  It's pretty surprising the performance 
difference with a "real server".  Disk IO, CPU to Memory transfers, 
Network IO all generally go way up when you get a more significant 
hardware platform.

Of course there's a lot to it, as others said, there isn't enough info 
in the original post.

If the processing involves reading A.txt into RAM and doing stuff to it, 
then cramming it into a Database, you'd likely do well with an AMD 
solution, their CPU to RAM transfers are really fast.  But disk 
performance isn't as important.

If you are going to be taking A.txt, grep it, spit out several small 
files, do stuff to them, then cram the results into a database, you'd 
get more bang by getting a really good RAID subsystem, and I don't mean 
software RAID.  I know that the argument of Software vs. Hardware RAID 
is second only to VI vs EMACS, but if you want real speed my vote is for 
hardware.  LOTS of Cache on the controller, disks that are engineered in 
conjunction with the controller, boy does it make a difference.

Are you,

Disk bound?
Processor Bound?
Memory Bound?
Network Bound?

All that said, you could still be better off to cluster some 
workstations.  Then you have the cluster coding question, how to best 
use the clustered systems?  Will you be creating a support headache that 
you will "own" for the rest of your days?

Lots to think about......

	Kevin

Mark Freeze wrote:
> Someone please take my side and settle an argument for me.
>  I have a friend who runs a business like mine and we have the same basic
> setup. We normally receive files from customers that may be 50 to 100 MB. We
> run programs on these files that parse text, create databases, purge
> records, and so on. Normal database stuff. Converting and parsing records
> with the software that I have written usually runs for about 1 hour on the
> larger files and we may have 2 or 3 of these files each time a customer
> trasmits data to us.
>  My friend says that he is considering clustering Linux boxes together to
> improve the speed of the processing and he figures that he can cut
> processing time in half. Now I may be in for a public spanking, but I did
> not think that clustering would have that much of an effect on this type of
> operation. Also, he is not talking about clustering new, workhorse p4
> machines... He is talking about clustering up about 4 or 5 p3 & p4 machines
> that he has as spares. From the things that I have read (including the link
> that someone posted the other day) I think that he has a misconception of
> clustering.
>  Am I way off base? Will clustering have this dramatic of an effect?
>  Thanks,
> Mark.