[TriLUG] Building a beowulf (or other) cluster

Mon Mar 28 14:06:19 EDT 2011

You can just use Postgres as your synchronization mechanism and the
work queue. Let's say you have a table of vehicles. This table would
contain all the info about the vehicle conditions + two more columns:
last_selected_on and processed_on (+ value, or whatever output you'd
need).

Each server then runs a process that (1) locks the vehicles table, (2)
selects 20 vehicles at a time and updates their last_selected_on
timestamps, (3) unlocks the table. In this very short operation, each
server marks a batch of 20 vehicles as "I am going to process these."
Once it is done processing them, simply update each record with the
value and set the processed_on to CURRENT_TIMESTAMP.

The only caveat here is that some batches might get started but then
the server might crash. Thus, you probably want to periodically clear
out the value of last_selected_on (for example WHERE last_selected_on
< CURRENT_TIMESTAMP - INTERVAL 1 HOUR AND processed_on IS NULL). You
could for example run this on every node every 10th time it is
selecting its batch of 20 vehicles.

This way the processing nodes are completely self-managed and you
don't need a separate controller. Just feed requests into the vehicles
table and wait for someone to process the results.

Igor

On Mon, Mar 28, 2011 at 1:55 PM, Ron Kelley <rkelleyrtp at gmail.com> wrote:
> Thanks everyone for responding so quickly.
>
> To avoid spilling the "secret sauce", here is an example of what we are doing (sorry if this is too long...):
>
> * We have a Postgresql database with +100million records of automobile information - each indexed by a specific "auto ID" value
>
> * Each record has the make, model, year, selling price, options, etc for an auto when it first sold as new (ie "off the showroom floor")
>
> * We need to calculate a fair market value (FMV) for a particular *used* car using the database info with our modeling engine (C++ program)
>
> * The FMV calculation looks at historical data of the auto (first sold as new, etc) as well as other factors (condition of auto, similarly priced autos, other autos in the same class, etc) that will impact the final FMV calculation.
>
>
> Our FMV model works very well, and I have automated the calculations using a shell script.  We simply pass in the features of the used auto into our utility, and the model spits out the FMV.  Over the past week or so, we have tuned it so we can pass in up to 20 used autos on a single command line and run these in parallel.  We have a server with dual 6-core Intel CPUs (3.3GHz) with HT enabled that provides 24 CPUs to our Linux box.  Once we crank up the utility, we can get 24 "threads" with 20 autos  in each thread - thereby giving us 480 FMVs per run.  Currently, we can get about 60/70 calculations per second.  Since the data in the database is static, each FMV calculation does not rely upon any other calculation.
>
> Unfortunately, we have lots of data to process, and our only limitation at this time is CPU power.  However, since the FMV modeling tool seems to scale very well, we would like to expand the processing to additional servers (each server hosting its own DB).  We can create an input file for all used autos and have each server work on a subset.  The expectation is - the more servers we add, the faster we can get through all the used auto calculations.
>
> From the email below, the embarrassingly parallel workload idea is spot-on.  I just need to find the right controller software to handle the distribution of tasks to the other nodes.  Time for more research
>
>
> Hopefully, this makes more sense...
>
> Thanks!
>
>
>
> -Ron Kelley
>
>
>
>
> On Mar 28, 2011, at 12:43 PM, Justis Peters wrote:
>
>> On 03/28/2011 11:07 AM, Joseph Mack NA3T wrote:
>>> On Mon, 28 Mar 2011, Ron Kelley wrote:
>>>
>>>> I would like to install some sort of distributed process management tool so we can enable N-nodes to run the computations simultaneously.
>>>
>>> You only use a beowulf if the job cannot fit inside a single machine/node. This usually means that the job needs more memory than a single node holds. If this is your situation, you then recode the app to use the nodes in parallel. This usually means using mpi or omp.
>>>
>>> If each job can be run in a single node, then you need a job farm (lots of machines with a job dispatcher).
>> Ron,
>>
>> I agree with Joe's take on your issue. You said, "Our processing happens in batch jobs and can easily be run on multiple servers at the same time." That sounds like an "embarrassingly parallel workload" (http://en.wikipedia.org/wiki/Embarrassingly_parallel), which is good news.
>>
>> There are probably hundreds of solutions to your goal. Until we have more details, I'll begin by pointing you to Amazon's EC2. It provides simple tools to quickly scale up the size of your cluster. No need to buy hardware. You only pay for the time you use: http://aws.amazon.com/ec2/
>>
>> When you say the project, "runs computational algorithms against some database data (sort of data mining)", it triggers a number of questions for me. What format is your data in? Is it already in a DBMS? How large is the data set? Can it be easily replicated between all the worker nodes? Do you need to update the data during calculations? Do other worker nodes need to also see those updates? Do you need features from the DBMS, such as indexes and aggregate functions, that would be a lot of work to replicate in external code? If so, how frequently do you need to use those features? Is your DBMS likely to become the bottleneck?
>>
>> Best of luck with your project. Keep us posted.
>>
>> Kind regards,
>> Justis
>> --
>> This message was sent to: Ron Kelley <rkelleyrtp at gmail.com>
>> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>> Unsubscribe or edit options on the web        : http://www.trilug.org/mailman/options/trilug/rkelleyrtp%40gmail.com
>> TriLUG FAQ          : http://www.trilug.org/wiki/Frequently_Asked_Questions
>
> --
> This message was sent to: Igor Partola <igor at igorpartola.com>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web  : http://www.trilug.org/mailman/options/trilug/igor%40igorpartola.com
> TriLUG FAQ          : http://www.trilug.org/wiki/Frequently_Asked_Questions
>