[Spread-users] more performance questions

Ciprian Tutu ciprian at jhu.edu
Thu Apr 29 09:53:16 EDT 2004


Hi Greg,

I am replying to you since you wrote the latest e-mail on this topic,
but I will try to give a more general answer, since I've seen the same
problem happening to a few others in the past weeks. Maybe we should
add an explanation like this to the FAQ :). I'll start with the
context of your problem.


GS> i have about 325 clients that connect to a single spread daemon... i'd
GS> say that about 75 to 80 percent of these clients are "send-only"
GS> clients... the rest are clients that "recv-only" and display information
GS> that the "send-only" clients are collecting and sending...
GS> ...
GS> i've been considering adding additional spread daemons to help with
GS> loading if it is, indeed, the case that one daemon cannot cope with what
GS> i'm asking of it...


Spread can help, generally speaking, in 2 aspects. One is the
efficient message distribution to groups of clients with ordering
and/or reliability guarantees. The second is the actual
management of multiple groups of clients and a clear interface for
addressing the groups.

With a setup like yours - one single spread daemon for multiple clients, you
are only taking advantage of the second feature - a simple interface
for various group multicast primitives.

Spread, due to the way it is designed - as a tiered client-daemon
architecture, as opposed to client libraries - cannot do any magic in
terms of efficiency unless several Spread daemons are deployed. One
main reason behind this architecture is scalability. Having the group
communication "intelligence" into client libraries, would imply having
all clients in the system coordinate for message delivery guarantees.
This method would clearly not scale to large number of clients.
Therefore we view Spread as an infrastructure, an overlay network,
that is to be deployed in key points in order to provide an enhanced
service to a large number of clients which communicate with each other
through their connection to the Spread network.

In a local area network setup, having one single Spread daemon, means
that all the sender clients will unicast their messages to the spread
daemon through the LAN, the daemon will determine the ordering and
maintain the reliability guarantees and will deliver the messages to
each receiving client again through unicast connection. This setup
does not provide any performance enhancement, quite the contrary, can
saturate your network with "redundant" messages. On the other hand,
if you have one Spread daemon for each machine that hosts clients (I
imagine that those 375 clients that you mention are not each running
on individual machines) then the message delivery will happen in an
optimized way among the Spread daemons, followed by "local" delivery
from each daemon to their clients, without any additional network
traffic. This way the _only_ traffic in your network would be the
inter-daemon spread traffic, rather than having a sleuth of unicast
connections from clients to daemons over the network.

In Wide Area Network settings, depending on the actual latency and
bandwidth of your setup, it _may_ make sense to have just one daemon
per site (ie. per LAN) as low WAN bandwidth and high latency may be
the only bottlenecks for your system. If that is the case, the extra
traffic you might be generating on your LAN, may not noticeably affect
your performance However, if the sites are connected with very good
links and you have a large number of nodes in each LAN, then it
becomes again useful to have several Spread daemons in each LAN, in
order to optimize the LAN part of the communication as
described in the previous paragraph.

I hope this clarifies a bit, the reasonings behind using multiple
spread daemons, which may seem counterintuitive at first to some.

Ciprian





More information about the Spread-users mailing list