[Spread-users] operation after partition heals

Jonathan Stanton jonathan at cnds.jhu.edu
Wed May 18 19:36:00 EDT 2005

You should receive a second membership showing that the partition has 
healed. How long that takes depends on your network setup and traffic. 

By default if no messages are seen, every 60 -90 seconds each daemon 
will 'ping' all the other daemons it can't talk to to try and 
'reconnect' with them. So if nothing happens before then, that should 
trigger the merge. More often, with LAN setups, if any messages are sent 
through spread once the the wire is plugged back in, the other daemons 
will see those broadcasts and thus learn that other daemons are 
alive/connected again and should trigger a membership then. That is much 
faster -- just a few seconds after the message. This doesn't work with 
wide-area daemons as they don't get the lan broadcasts. 

The 60-90 second timeout can be changed by editing membership.c
(Lookup_timeout) and recompiling and some users do that for specialized
environments where they need fast recovery and are willing to pay an
increased bandwidth overhead cost. 

If you do not see any merge after about 90 seconds or so, then something 
strange is wrong and please email me back. 



On Wed, May 18, 2005 at 06:33:16PM -0400, Scott Barvick wrote:
> What is the expected operation after a network partition heals?  I get
> the membership change messages when the partition occurs so that each of
> 2 nodes thinks it is the only one, but when the partition heals (I plug
> the ethernet cable back in), I don't see any awareness of the other node
> on either side.  Shouldn't the multicast/broadcast messages find their
> way into Spread and kick something off?
> More specifically, I have 2 nodes and I pull the network interface cable
> out of one of them.  When I plug it back in, I can ping.  I haven't
> looked at the network traffic yet, but I can do that.  I have MEMBERSHIP
> logging turned on and I see the token and membership operations at the
> partition, but there is no output when the partition heals.
> Thanks for any advice,
> Scott
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users

Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    

More information about the Spread-users mailing list