[TriLUG] Clusters, performance, etc...

Mon Nov 7 21:00:26 EST 2005

Mark,

It sounds like you may get some benefit from parallelization that comes 
from cluster processing, but you will have to rework your process to 
achieve it.

The current process seems like it would benefit from IO speedup 
activities.  RAID, cache on the controller, etc.

If you were to move to a server class system, I'd still suggest that you 
look at your process/code, see if you could do more in RAM.  Perhaps you 
could do that on a desktop system with lots of RAM and get most of the 
gains.  That would be less hardware investment, but more coding for you.

Good luck!

	Kevin

Mark Freeze wrote:
> You guys are way ahead of me on some of the hardware questions... However,
> to try and answer some of them:
>  I have a script that controls the following actions:
>  1. Runs a c++ program that I wrote that opens a text file (the 50 - 100 MB
> file that I mentioned), reads each line sequentially and splits the data
> into two output files after performing numerous tasks to the data. (e.g.
> checking the validity of the zip code, making sure it matches the state,
> calculating amounts due, etc...
>  2. Makes the second file into a dbase file
>  3. Runs another c++ program on the first file that examines each record in
> the file and compares it to another database (using proprietary code
> libraries supplied by our software vendor) that corrects any bad info in the
> address, adds a zip+4, adds carrier route info, etc...
>  4. Looks for another text file to process
>  5. Appends all processed text files together
>  6. Appends all dbase files into one
>  As I said in my previous post, each 100MB text file takes about 1 hr to
> run. Most of this time is spent on step 3.
>  So, would clustering speed up this sometimes 3 - 4 hr process?
>  Thanks,
> Mark.