[Spread-users] operation after partition heals
Scott Barvick
sbarvick at revasystems.com
Fri May 20 14:41:23 EDT 2005
Following up, I think the answer in this case is (in membership.c):
/* Lookup timeout when only one segment exists can be longer,
* since a no remote segments need to be probed
*/
if ( Cn.num_segments == 1 )
Lookup_timeout.sec = 300;
We only have one segment, but we are using Spread over multicast with
IGMP snooping/filtering, and it looks like I need the unicasts to kick
things over (5 minutes later). On a hub, the repair is quick.
Thanks,
Scott
On Wed, 2005-05-18 at 19:36, Jonathan Stanton wrote:
> You should receive a second membership showing that the partition has
> healed. How long that takes depends on your network setup and traffic.
>
> By default if no messages are seen, every 60 -90 seconds each daemon
> will 'ping' all the other daemons it can't talk to to try and
> 'reconnect' with them. So if nothing happens before then, that should
> trigger the merge. More often, with LAN setups, if any messages are sent
> through spread once the the wire is plugged back in, the other daemons
> will see those broadcasts and thus learn that other daemons are
> alive/connected again and should trigger a membership then. That is much
> faster -- just a few seconds after the message. This doesn't work with
> wide-area daemons as they don't get the lan broadcasts.
>
> The 60-90 second timeout can be changed by editing membership.c
> (Lookup_timeout) and recompiling and some users do that for specialized
> environments where they need fast recovery and are willing to pay an
> increased bandwidth overhead cost.
>
> If you do not see any merge after about 90 seconds or so, then something
> strange is wrong and please email me back.
>
> Cheers,
>
> Jonathan
>
> On Wed, May 18, 2005 at 06:33:16PM -0400, Scott Barvick wrote:
> > What is the expected operation after a network partition heals? I get
> > the membership change messages when the partition occurs so that each of
> > 2 nodes thinks it is the only one, but when the partition heals (I plug
> > the ethernet cable back in), I don't see any awareness of the other node
> > on either side. Shouldn't the multicast/broadcast messages find their
> > way into Spread and kick something off?
> >
> > More specifically, I have 2 nodes and I pull the network interface cable
> > out of one of them. When I plug it back in, I can ping. I haven't
> > looked at the network traffic yet, but I can do that. I have MEMBERSHIP
> > logging turned on and I see the token and membership operations at the
> > partition, but there is no output when the partition heals.
> >
> > Thanks for any advice,
> > Scott
> >
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
More information about the Spread-users
mailing list