[Spread-users] Recoverying from a network failure

John Lane Schultz jschultz at spreadconcepts.com
Fri Dec 13 09:30:42 EST 2013


Spread is designed to handle these kinds of network faults.  Depending on your timeouts, Spread will detect a network fault within seconds and reform to whatever membership it can reach in another few seconds.  When the network heals, the daemons will either eventually probe one another (another timeout) or if they hear user traffic multi/broadcasted, then they will reconnect.

If Spread is unable to complete a membership when the network is reconnected, then it is likely that something, possibly in the network (e.g. - firewalls, fault NICs, etc.), is pathologically preventing some of the messages getting from some daemons to others.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Cell: 443 838 2200

On Dec 13, 2013, at 9:23 AM, Göran Hasse <gorhas at gmail.com> wrote:

Hello!

I am simulating a network failure by unplug the switch for a while. If
I have the
network down for some minutes the spread-bus will not recover.

I just get a lot of;

Memb_token_loss: I lost my token, state is 6
[Fri 13 Dec 2013 14:17:39] Scast_alive: State is 2
[Fri 13 Dec 2013 14:17:40] Scast_alive: State is 2
[Fri 13 Dec 2013 14:17:41] Memb_handle_message: handling join message
from 192.168.0.10, State is 2
[Fri 13 Dec 2013 14:17:41] Send_join: State is 4
[Fri 13 Dec 2013 14:17:42] Memb_handle_message: handling join message
from 192.168.0.10, State is 4
[Fri 13 Dec 2013 14:17:42] Send_join: State is 4
[Fri 13 Dec 2013 14:17:43] Memb_handle_message: handling join message
from 192.168.0.10, State is 4
[Fri 13 Dec 2013 14:17:43] Send_join: State is 4
[Fri 13 Dec 2013 14:17:44] Memb_handle_message: handling join message
from 192.168.0.10, State is 4
[Fri 13 Dec 2013 14:17:44] Send_join: State is 4
[Fri 13 Dec 2013 14:17:45] Memb_handle_message: handling join message
from 192.168.0.10, State is 4
[Fri 13 Dec 2013 14:17:45] Send_join: State is 4
[Fri 13 Dec 2013 14:17:46] Memb_handle_token: handling form1 token
[Fri 13 Dec 2013 14:17:46] Handle_form1 in GATHER
[Fri 13 Dec 2013 14:17:46] Memb_handle_token: handling form1 token
[Fri 13 Dec 2013 14:17:46] Handle_form1 in FORM
[Fri 13 Dec 2013 14:17:46] Memb_handle_token: handling form2 token
[Fri 13 Dec 2013 14:17:46] Handle_form2 in FORM
[Fri 13 Dec 2013 14:17:46] Memb_handle_token: handling form2 token
[Fri 13 Dec 2013 14:17:46] Handle_form2 in EVS

Is there any dokumentation on how spread will behaive on a network fault?

What time can the network be down before one experience problems?

/gh







-- 
gorhas at gmail.com
Göran Hasse
Boo 229
715 91  ODENSBACKEN
Mob: 070-5530148

_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list