[Spread-users] Re: Newbie: Initial queries

Wed Mar 13 11:39:10 EST 2002

On Wed, Mar 13, 2002 at 02:56:07PM +0000, David Turland wrote:
> Hi again,
> Thanks for your patience :-)
> After getting into the Spread frame of mind I easily found the answer to my 
> question:
> 
> nodes detect the disconnection;
> nodes call:
> 
> SP_disconnect(...); (this may not be necesary)

It is officially necessary. It might work in some cases now to skip it, but
it will stop working in the next version, and it probably doesn't always
work now to skip it.

> 
> ret = SP_connect_timeout(...);
> while (ret!= ACCEPT_SESSION)
> {
>   E_delay(timeout);
>   ret = SP_connect_timeout(...);
> }
> 
> 
> 
> hopefully the master node or another arbitrary node (but fear race condition) 
> restarts  the spread daemon.

Restarting the spread daemons has to be done by something outside the system
(which is I think what you mean above) 

Spread by default does not provide any way for a client to know which spread
daemon to connect to. That is considered part of the applciation because it
will depend a lot on what the application is doing. Certainly the simplest
thing is to keep trying to connect to the same server and just wait until it
comes back up.

You could get a faster recovery if you knew of several servers and could
connect to another one if your primary daemon was down. 

There isn't a race problem with restarting the daemons as the daemon does a
check for a previously running daemon on the same port and refuses to
restart if that is true. So it should not matter if multiple nodes try to
restart a daemon. To make it simpler, I recommend using a script like most
sysv startup scripts that will check for a running instance, write the pid
into a /var/run/ file and that will give a nice way outside of spreaed to
safely start it.

Just a note about your previous email. If a daemon goes down all clients (as
you noticed) will be disconnected. If the clients are running on multiple
machiens, it could be beneficial to run multiple daemons, one on each
machine, then when one went down all teh other clients would stay up and
connected to their local daemons and the daemons will reconfigure
automatially to exclude the crashed daemon. When the crashed daemon comes
back up, it will be included again in the configuration along with any
clients who are using it.

Jonathan

-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------