[TriLUG] Building a beowulf (or other) cluster
Aaron Joyner
aaron at joyner.ws
Mon Mar 28 17:00:02 EDT 2011
The database solution will indeed work, but requires baking your
worker process to basically be a job control problem. If you want to
process repeatedly-added data, that's potentially very useful. If you
just want to start this and run it once until the data consumption is
done, a batch-processing model might be cleaner. Embarrassingly
parallel solutions usually lend themselves well to the MapReduce
model:
In the general sense:
http://en.wikipedia.org/wiki/MapReduce
in particular, in the open source world:
http://en.wikipedia.org/wiki/Hadoop
Aaron S. Joyner
On Mon, Mar 28, 2011 at 2:19 PM, Ron Kelley <rkelleyrtp at gmail.com> wrote:
> Very interesting idea. Essentially, we would run a script on each server in a while loop that continually asks the DB for more records to process. We can probably add some other information to the DB so we can get timing data, etc (how many left to process, how fast we are processing, etc).
>
> Let me mull over the details and see if we can implement something like this...
>
> Thanks for the great idea!
>
>
> -Ron
>
>
>
> On Mar 28, 2011, at 2:06 PM, Igor Partola wrote:
>
>> You can just use Postgres as your synchronization mechanism and the
>> work queue. Let's say you have a table of vehicles. This table would
>> contain all the info about the vehicle conditions + two more columns:
>> last_selected_on and processed_on (+ value, or whatever output you'd
>> need).
>>
>> Each server then runs a process that (1) locks the vehicles table, (2)
>> selects 20 vehicles at a time and updates their last_selected_on
>> timestamps, (3) unlocks the table. In this very short operation, each
>> server marks a batch of 20 vehicles as "I am going to process these."
>> Once it is done processing them, simply update each record with the
>> value and set the processed_on to CURRENT_TIMESTAMP.
>>
>> The only caveat here is that some batches might get started but then
>> the server might crash. Thus, you probably want to periodically clear
>> out the value of last_selected_on (for example WHERE last_selected_on
>> < CURRENT_TIMESTAMP - INTERVAL 1 HOUR AND processed_on IS NULL). You
>> could for example run this on every node every 10th time it is
>> selecting its batch of 20 vehicles.
>>
>> This way the processing nodes are completely self-managed and you
>> don't need a separate controller. Just feed requests into the vehicles
>> table and wait for someone to process the results.
>>
>> Igor
>>
>> On Mon, Mar 28, 2011 at 1:55 PM, Ron Kelley <rkelleyrtp at gmail.com> wrote:
>>> Thanks everyone for responding so quickly.
>>>
>>> To avoid spilling the "secret sauce", here is an example of what we are doing (sorry if this is too long...):
>>>
>>> * We have a Postgresql database with +100million records of automobile information - each indexed by a specific "auto ID" value
>>>
>>> * Each record has the make, model, year, selling price, options, etc for an auto when it first sold as new (ie "off the showroom floor")
>>>
>>> * We need to calculate a fair market value (FMV) for a particular *used* car using the database info with our modeling engine (C++ program)
>>>
>>> * The FMV calculation looks at historical data of the auto (first sold as new, etc) as well as other factors (condition of auto, similarly priced autos, other autos in the same class, etc) that will impact the final FMV calculation.
>>>
>>>
>>> Our FMV model works very well, and I have automated the calculations using a shell script. We simply pass in the features of the used auto into our utility, and the model spits out the FMV. Over the past week or so, we have tuned it so we can pass in up to 20 used autos on a single command line and run these in parallel. We have a server with dual 6-core Intel CPUs (3.3GHz) with HT enabled that provides 24 CPUs to our Linux box. Once we crank up the utility, we can get 24 "threads" with 20 autos in each thread - thereby giving us 480 FMVs per run. Currently, we can get about 60/70 calculations per second. Since the data in the database is static, each FMV calculation does not rely upon any other calculation.
>>>
>>> Unfortunately, we have lots of data to process, and our only limitation at this time is CPU power. However, since the FMV modeling tool seems to scale very well, we would like to expand the processing to additional servers (each server hosting its own DB). We can create an input file for all used autos and have each server work on a subset. The expectation is - the more servers we add, the faster we can get through all the used auto calculations.
>>>
>>> From the email below, the embarrassingly parallel workload idea is spot-on. I just need to find the right controller software to handle the distribution of tasks to the other nodes. Time for more research
>>>
>>>
>>> Hopefully, this makes more sense...
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -Ron Kelley
>>>
>>>
>>>
>>>
>>> On Mar 28, 2011, at 12:43 PM, Justis Peters wrote:
>>>
>>>> On 03/28/2011 11:07 AM, Joseph Mack NA3T wrote:
>>>>> On Mon, 28 Mar 2011, Ron Kelley wrote:
>>>>>
>>>>>> I would like to install some sort of distributed process management tool so we can enable N-nodes to run the computations simultaneously.
>>>>>
>>>>> You only use a beowulf if the job cannot fit inside a single machine/node. This usually means that the job needs more memory than a single node holds. If this is your situation, you then recode the app to use the nodes in parallel. This usually means using mpi or omp.
>>>>>
>>>>> If each job can be run in a single node, then you need a job farm (lots of machines with a job dispatcher).
>>>> Ron,
>>>>
>>>> I agree with Joe's take on your issue. You said, "Our processing happens in batch jobs and can easily be run on multiple servers at the same time." That sounds like an "embarrassingly parallel workload" (http://en.wikipedia.org/wiki/Embarrassingly_parallel), which is good news.
>>>>
>>>> There are probably hundreds of solutions to your goal. Until we have more details, I'll begin by pointing you to Amazon's EC2. It provides simple tools to quickly scale up the size of your cluster. No need to buy hardware. You only pay for the time you use: http://aws.amazon.com/ec2/
>>>>
>>>> When you say the project, "runs computational algorithms against some database data (sort of data mining)", it triggers a number of questions for me. What format is your data in? Is it already in a DBMS? How large is the data set? Can it be easily replicated between all the worker nodes? Do you need to update the data during calculations? Do other worker nodes need to also see those updates? Do you need features from the DBMS, such as indexes and aggregate functions, that would be a lot of work to replicate in external code? If so, how frequently do you need to use those features? Is your DBMS likely to become the bottleneck?
>>>>
>>>> Best of luck with your project. Keep us posted.
>>>>
>>>> Kind regards,
>>>> Justis
>>>> --
>>>> This message was sent to: Ron Kelley <rkelleyrtp at gmail.com>
>>>> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
>>>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>>>> Unsubscribe or edit options on the web : http://www.trilug.org/mailman/options/trilug/rkelleyrtp%40gmail.com
>>>> TriLUG FAQ : http://www.trilug.org/wiki/Frequently_Asked_Questions
>>>
>>> --
>>> This message was sent to: Igor Partola <igor at igorpartola.com>
>>> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
>>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>>> Unsubscribe or edit options on the web : http://www.trilug.org/mailman/options/trilug/igor%40igorpartola.com
>>> TriLUG FAQ : http://www.trilug.org/wiki/Frequently_Asked_Questions
>>>
>> --
>> This message was sent to: Ron Kelley <rkelleyrtp at gmail.com>
>> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
>> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
>> Unsubscribe or edit options on the web : http://www.trilug.org/mailman/options/trilug/rkelleyrtp%40gmail.com
>> TriLUG FAQ : http://www.trilug.org/wiki/Frequently_Asked_Questions
>
> --
> This message was sent to: Aaron S. Joyner <aaron at joyner.ws>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web : http://www.trilug.org/mailman/options/trilug/aaron%40joyner.ws
> TriLUG FAQ : http://www.trilug.org/wiki/Frequently_Asked_Questions
>
More information about the TriLUG
mailing list