[TriLUG] Building a beowulf (or other) cluster
Justis Peters
jtrilug at indythinker.com
Mon Mar 28 12:43:07 EDT 2011
On 03/28/2011 11:07 AM, Joseph Mack NA3T wrote:
> On Mon, 28 Mar 2011, Ron Kelley wrote:
>
>> I would like to install some sort of distributed process management
>> tool so we can enable N-nodes to run the computations simultaneously.
>
> You only use a beowulf if the job cannot fit inside a single
> machine/node. This usually means that the job needs more memory than a
> single node holds. If this is your situation, you then recode the app
> to use the nodes in parallel. This usually means using mpi or omp.
>
> If each job can be run in a single node, then you need a job farm
> (lots of machines with a job dispatcher).
Ron,
I agree with Joe's take on your issue. You said, "Our processing happens
in batch jobs and can easily be run on multiple servers at the same
time." That sounds like an "embarrassingly parallel workload"
(http://en.wikipedia.org/wiki/Embarrassingly_parallel), which is good news.
There are probably hundreds of solutions to your goal. Until we have
more details, I'll begin by pointing you to Amazon's EC2. It provides
simple tools to quickly scale up the size of your cluster. No need to
buy hardware. You only pay for the time you use: http://aws.amazon.com/ec2/
When you say the project, "runs computational algorithms against some
database data (sort of data mining)", it triggers a number of questions
for me. What format is your data in? Is it already in a DBMS? How large
is the data set? Can it be easily replicated between all the worker
nodes? Do you need to update the data during calculations? Do other
worker nodes need to also see those updates? Do you need features from
the DBMS, such as indexes and aggregate functions, that would be a lot
of work to replicate in external code? If so, how frequently do you need
to use those features? Is your DBMS likely to become the bottleneck?
Best of luck with your project. Keep us posted.
Kind regards,
Justis
More information about the TriLUG
mailing list