[Spread-users] operation after partition heals
Jonathan Stanton
jonathan at cnds.jhu.edu
Wed May 18 19:36:00 EDT 2005
You should receive a second membership showing that the partition has
healed. How long that takes depends on your network setup and traffic.
By default if no messages are seen, every 60 -90 seconds each daemon
will 'ping' all the other daemons it can't talk to to try and
'reconnect' with them. So if nothing happens before then, that should
trigger the merge. More often, with LAN setups, if any messages are sent
through spread once the the wire is plugged back in, the other daemons
will see those broadcasts and thus learn that other daemons are
alive/connected again and should trigger a membership then. That is much
faster -- just a few seconds after the message. This doesn't work with
wide-area daemons as they don't get the lan broadcasts.
The 60-90 second timeout can be changed by editing membership.c
(Lookup_timeout) and recompiling and some users do that for specialized
environments where they need fast recovery and are willing to pay an
increased bandwidth overhead cost.
If you do not see any merge after about 90 seconds or so, then something
strange is wrong and please email me back.
Cheers,
Jonathan
On Wed, May 18, 2005 at 06:33:16PM -0400, Scott Barvick wrote:
> What is the expected operation after a network partition heals? I get
> the membership change messages when the partition occurs so that each of
> 2 nodes thinks it is the only one, but when the partition heals (I plug
> the ethernet cable back in), I don't see any awareness of the other node
> on either side. Shouldn't the multicast/broadcast messages find their
> way into Spread and kick something off?
>
> More specifically, I have 2 nodes and I pull the network interface cable
> out of one of them. When I plug it back in, I can ping. I haven't
> looked at the network traffic yet, but I can do that. I have MEMBERSHIP
> logging turned on and I see the token and membership operations at the
> partition, but there is no output when the partition heals.
>
> Thanks for any advice,
> Scott
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
--
-------------------------------------------------------
Jonathan R. Stanton jonathan at cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------
More information about the Spread-users
mailing list