[Spread-users] Spread Reconnect?

John Schultz jschultz at spreadconcepts.com
Fri Aug 27 14:23:28 EDT 2010


Spread (re)discovers connectivity between daemons two different ways:  (1) hearing regular (i.e. - user) multicast traffic from daemons configured in its segment to which it is not currently connected or (2) periodically probing (or being probed) its segment and remote daemons.  I believe that once a daemons start trying to form a new membership (including newly started daemons) they immediately probe as well.

The first mechanism can only be triggered if you actually have user traffic flowing.  If none of your clients are sending anything, then previously partitioned (e.g. - by pulling a machine's network cable) daemons (that have established memberships, even singletons) can sit in a segment blissfully unaware of each other until they decide to probe or are probed.  The moment one of your clients sends something, then all connected daemons in the same segment will hear it and they will form a new membership very quickly.

The probing mechanism typically has rather long timeouts and sometimes very long timeouts.  Depending on your configuration, the default probing timeouts are 60, 90 or 300 seconds.  Those should be upper bounds on how long two connected daemons might ignore one another.  Whether your configuration is using the 60, 90 or 300 second timeout depends on how many segments you have and whether or not the IP addresses of your segments are considered "near" or "far."  If you only have a single segment, then you will use the 300 second timeout.  Otherwise, if all your daemon IP addresses have the same top two bytes (i.e. - A and B in A.B.C.D), then you will use the 60 second timeout.  Otherwise you will use the 90 second timeout.

So, if you only have a single segment, clients aren't sending traffic and you pull the network plug on a daemon and allow the daemons to reestablish memberships and then replug the machine back in, it could well take up to 5 minutes before the daemons realized they could reconnect.  Typically, it will occur faster as different daemons will trip that timeout at different times (I think).

As to the second part of your first question and your second question, I'm not really following you.  The membership list remains valid for all messages delivered in a membership, until the next membership list is delivered.  If you quickly pull and replug a network cable, then Spread will quietly handle any hiccup you might have caused and the membership will not change and communication will continue.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Aug 27, 2010, at 1:29 PM, Marc Fiedler wrote:

Hi guys,

i got a small problem with the spread not reconnecting after losing the
network connection
for instance if the network cable was pulled out.

so i got 2 questions:

1. Is there any possibility that spread reconnects after that?
   for the standard communication in the spread group this is no
problem at all, after plugging the
   cable back in, the communication goes on, but the spread member
lists are not working any
   more then. what brings me to my second question...

2. is there some sort of "refresh" message that could be sent around in
the spread group, so that everyone
   gets a new member list?

I'm working on that now, but it would be really great if you had some
advice how to deal with this problem
on spread.

Thanks.

cheers

_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20100827/46a6c8d6/attachment.bin 


More information about the Spread-users mailing list