[Spread-users] Question on Spread for high performance database application

Mon Feb 6 16:06:29 EST 2006

Hi,

We are evaluating Spread as a replacement for MPI for a high data
throughput database application running on a cluster of Linux boxes
with GigE as interconnect.  Our application may need to scale anywhere
from several hundred to 1000 nodes.

Specifically, we are considering Spread for foll purposes :

a) Providing us cluster membership service. This is where MPI
currently fails us as it is not tolerant to a node in a cluster going
down.  Spread is a clear choice here.

b) For point-to-point communication between nodes.  The alternative
for us is to roll out our own using vanilla TCP sockets.  Significant
portion of our data traffic is point-to-point and we need high data
throughput here. We are concerned by the extra hop of going through
the Spread daemon.

Some questions we have are:

a) Are there any studies on network performance of Spread vs sockets
for large amounts of data transfer ?

b) How much overhead might Spread introduce in terms of CPU usage ?

c) There is a limit of 128 Spread daemons. This means we may need to have
one daemon for a set of nodes in our cluster. What is the impact of
the daemon running on same node as client vs. another ?

d) Are there any studies comparing Spread to MPI ?

We are doing our own experiments to answer these questions but it
would be great to hear other experiences or general comments on
suitability of Spread for this type of an application.

Thank you for your help!

Shilpa Lawande.