[TriLUG] Building a beowulf (or other) cluster

Ron Kelley rkelleyrtp at gmail.com
Mon Mar 28 13:55:06 EDT 2011


Thanks everyone for responding so quickly.

To avoid spilling the "secret sauce", here is an example of what we are doing (sorry if this is too long...):

* We have a Postgresql database with +100million records of automobile information - each indexed by a specific "auto ID" value

* Each record has the make, model, year, selling price, options, etc for an auto when it first sold as new (ie "off the showroom floor")

* We need to calculate a fair market value (FMV) for a particular *used* car using the database info with our modeling engine (C++ program)

* The FMV calculation looks at historical data of the auto (first sold as new, etc) as well as other factors (condition of auto, similarly priced autos, other autos in the same class, etc) that will impact the final FMV calculation.


Our FMV model works very well, and I have automated the calculations using a shell script.  We simply pass in the features of the used auto into our utility, and the model spits out the FMV.  Over the past week or so, we have tuned it so we can pass in up to 20 used autos on a single command line and run these in parallel.  We have a server with dual 6-core Intel CPUs (3.3GHz) with HT enabled that provides 24 CPUs to our Linux box.  Once we crank up the utility, we can get 24 "threads" with 20 autos  in each thread - thereby giving us 480 FMVs per run.  Currently, we can get about 60/70 calculations per second.  Since the data in the database is static, each FMV calculation does not rely upon any other calculation.

Unfortunately, we have lots of data to process, and our only limitation at this time is CPU power.  However, since the FMV modeling tool seems to scale very well, we would like to expand the processing to additional servers (each server hosting its own DB).  We can create an input file for all used autos and have each server work on a subset.  The expectation is - the more servers we add, the faster we can get through all the used auto calculations.

From the email below, the embarrassingly parallel workload idea is spot-on.  I just need to find the right controller software to handle the distribution of tasks to the other nodes.  Time for more research


Hopefully, this makes more sense...

Thanks!



-Ron Kelley




On Mar 28, 2011, at 12:43 PM, Justis Peters wrote:

> On 03/28/2011 11:07 AM, Joseph Mack NA3T wrote:
>> On Mon, 28 Mar 2011, Ron Kelley wrote:
>> 
>>> I would like to install some sort of distributed process management tool so we can enable N-nodes to run the computations simultaneously.
>> 
>> You only use a beowulf if the job cannot fit inside a single machine/node. This usually means that the job needs more memory than a single node holds. If this is your situation, you then recode the app to use the nodes in parallel. This usually means using mpi or omp.
>> 
>> If each job can be run in a single node, then you need a job farm (lots of machines with a job dispatcher).
> Ron,
> 
> I agree with Joe's take on your issue. You said, "Our processing happens in batch jobs and can easily be run on multiple servers at the same time." That sounds like an "embarrassingly parallel workload" (http://en.wikipedia.org/wiki/Embarrassingly_parallel), which is good news.
> 
> There are probably hundreds of solutions to your goal. Until we have more details, I'll begin by pointing you to Amazon's EC2. It provides simple tools to quickly scale up the size of your cluster. No need to buy hardware. You only pay for the time you use: http://aws.amazon.com/ec2/
> 
> When you say the project, "runs computational algorithms against some database data (sort of data mining)", it triggers a number of questions for me. What format is your data in? Is it already in a DBMS? How large is the data set? Can it be easily replicated between all the worker nodes? Do you need to update the data during calculations? Do other worker nodes need to also see those updates? Do you need features from the DBMS, such as indexes and aggregate functions, that would be a lot of work to replicate in external code? If so, how frequently do you need to use those features? Is your DBMS likely to become the bottleneck?
> 
> Best of luck with your project. Keep us posted.
> 
> Kind regards,
> Justis
> -- 
> This message was sent to: Ron Kelley <rkelleyrtp at gmail.com>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web	: http://www.trilug.org/mailman/options/trilug/rkelleyrtp%40gmail.com
> TriLUG FAQ          : http://www.trilug.org/wiki/Frequently_Asked_Questions




More information about the TriLUG mailing list