[Spread-users] Is this a reasonable way to use spread?

Mon Oct 7 09:55:06 EDT 2002

Hi,

as I've written before, we're using spread in our bioinformatics
platform to distribute user requests for computations from a
persistent, database-based queue onto a cluster of Linux or Solaris
servers.  We are anticipating server clusters of up to a few hundred
machines, each of which runs one controller process and, say, two to
five jobs simultaneously for a maximum of 1200 application processes
(200 controllers, 1000 jobs) in total.

The current design places all of these processes in a single group;
the controllers use the group ordering trick described in the WALRUS
paper (http://www.cnds.jhu.edu/pub/papers/dcs.pdf) as a sort of
distributed mutex to designate one controller as the queue master.
The controllers also use the group membership messages to maintain a
replicated view of the state of the cluster.

>From what I've read about spread, this all appears to be well within
its scalability parameters.

Our client software is written in java, and typically runs under
on a Windows workstation where *no* spread daemon is running
(theoretically, it could run from anywhere in the web).
The client has to allow the user to monitor the state of his own
job requests.  For jobs which are running, he should be provided
with periodic, detailed information on the state of the job.
One idea I had is to place each job process into its own spread group
and have clients which are interested in a specific job just join
the job's group.  Then the detailed state information can be broadcast
to the group using either a push or a pull technique.

I see two potential problems: 1) the number of groups can expand to
upwards of 1,000, and 2) the clients will be connecting to the spread
network from nodes outside of the network which are not running their
own daemons.  In the extreme, this might result in a single daemon
having to handle connections to several hundred clients as well
as the jobs on its cluster node.  I can't tell from the papers and
documentation available whether this is feasible and within spread's
capabilities or not.

I'd appreciate any thoughts experienced spread users (or its
designers) might have on this.

Thanks, Jim

-- 
------------------------------------------------------------------------
Jim Rauser                                          Science Factory GmbH
mailto:j.rauser at science-factory.com                       Unter Käster 1
Tel: +49 221 277 399 204                          50667 Cologne, Germany