[Spread-users] Re: how to scale

Tue May 15 11:01:38 EDT 2001

Hello,

	Some comments on this. 

	First, the standard solution which works currently is to have your
configuration file contain some "Extra" IP addresses which are listed but
have no daemon running on them. Then if you need to bring up more daemons
you just run them on those IP addresses and they dynamically are added to
the Spread system. 

This approach has no real costs if you can list the IP addresses ahead of
time. If you can predict roughly your need for additional Spread daemons
over time this might be enough. This can be more difficult if the daemons
need to be located on several different networks.

It is theoretically possible to load a new configuration file into a
running network of Spread daemons and only pay the cost of a normal daemon
membership change (just as if a crashed daemon had recovered and
restarted), however, it is VERY tricky to get right. We have developed a
basic approach but there are lots of details to get right to prove that it
will still be a correct implementation.

Depending on the network traffic you are pushing between your replicated
servers the solution you describe of running several application servers
connected to each Spread daemon may be a very good solution. Each daemon
should be able to support upto a thousand active connections to it, but
since each message has to be sent individually to each connection it is
much more efficient (network-wise) to run a Spread daemon on each machine
with application servers. If the amount of traffic is small sending extra
copies will probably not be your bottleneck. If the amount of traffic is
large, you probably want a Spread daemon per machine to use the network
multicast service optimally.

If your application can handle a switchover (like they can all be send a
coordinated message and told --NOW-- switch to new configuration and still
be correct) they you can use two separate Spread configurations (running
on different ports) one being the active one, one the new one with more
machines in it and a different configuration file. Then send your
application the signal to switch (maybe you need a 'prepare to switch'
signal to have the apps stop initiating new messages) and they can
disconnect from one Spread daemon and connect to the other one running on
a different port. This will remove any spread caused reconfiguration delay
and allow your app to keep running and keep it's open client connections.
It just has to pause doing active work for a small period of time
(probably a few seconds is enough).

Hope this helps, feel free to contact me if you have questions.

Jonathan

On Tue, May 15, 2001 at 10:38:17AM -0400, Ding Yiqiang wrote:
> Hi there,
> 
> Yes, we can help each other. My requirement is to utilize some group
> communication software to make some communicating servers fault-tolarent.
> 
> One of the problems is the scaling problem. One solution may involve the
> modification of current spread demon code so that it's possible to modify
> network configuration dynamically. But I don't know how much work it may
> take. Another solution is we set up several fixed number of spread demons.
> Then make all servers connect to these demons. But when the number of
> servers is getting large, it's still required to add more demons which makes
> re-configuration/restart nevitable.
> 
> Any comments?
> 
> Yiqiang
> 

> "Ding Yiqiang" <ding_yiqiang at yahoo.com> wrote:
> 
> Hi,
> 
> If more machines are to be added to group communication map, is it required
> to
> modify all configuration files and restart all those currently running
> demons?
> If that's the truth, then the cost would be a lot.
> 
> Thanks,
> 
> Yiqiang
> 

-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------