[Spread-users] Member token loss at regular-ish 5 minute intervals
Barry Abrahamson
barry at automattic.com
Thu May 10 22:56:55 EDT 2007
We are using spread + wackamole + pound to achieve a HA solution for
our application. We have 5 pairs of servers deployed across the
country (1 pair per datacenter), and most of them work just fine.
There is this one pair of servers, however, that is having problems.
Every 5-ish minutes, 1 of 2 things happen:
Case 1:
(on segment leader)
[Fri 11 May 2007 02:33:43] Send_join: State is 4
[Fri 11 May 2007 02:33:44] Send_join: State is 4
[Fri 11 May 2007 02:33:45] Send_join: State is 4
[Fri 11 May 2007 02:33:46] Send_join: State is 4
[Fri 11 May 2007 02:33:47] Send_join: State is 4
(on segment member)
[Fri 11 May 2007 02:33:43] Memb_handle_message: handling join message
from -1408237407, State is 1
[Fri 11 May 2007 02:33:43] Handle_join in OP
Case 2:
(on segment leader)
[Fri 11 May 2007 02:39:27] Memb_token_loss: I lost my token, state is 1
[Fri 11 May 2007 02:39:27] Scast_alive: State is 2
[Fri 11 May 2007 02:39:27] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:27] Handle_alive in SEG
[Fri 11 May 2007 02:39:28] Scast_alive: State is 2
[Fri 11 May 2007 02:39:28] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:28] Handle_alive in SEG
[Fri 11 May 2007 02:39:32] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:32] Handle_alive in SEG
[Fri 11 May 2007 02:39:33] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:33] Handle_alive in SEG
[Fri 11 May 2007 02:39:34] Memb_handle_message: handling join message
from -1408237406, State is 2
[Fri 11 May 2007 02:39:34] Scast_alive: State is 3
[Fri 11 May 2007 02:39:35] Memb_handle_message: handling join message
from -1408237406, State is 3
[Fri 11 May 2007 02:39:35] Scast_alive: State is 3
[Fri 11 May 2007 02:39:36] Memb_handle_message: handling join message
from -1408237406, State is 3
[Fri 11 May 2007 02:39:36] Scast_alive: State is 3
[Fri 11 May 2007 02:39:37] Memb_handle_message: handling join message
from -1408237406, State is 3
[Fri 11 May 2007 02:39:37] Scast_alive: State is 3
[Fri 11 May 2007 02:39:38] Memb_handle_message: handling join message
from -1408237406, State is 3
[Fri 11 May 2007 02:39:38] Scast_alive: State is 3
[Fri 11 May 2007 02:39:39] Memb_handle_token: handling form1 token
[Fri 11 May 2007 02:39:39] Handle_form1 in REPRESENTED
[Fri 11 May 2007 02:39:39] Memb_handle_token: handling form1 token
[Fri 11 May 2007 02:39:39] Handle_form1 in FORM
[Fri 11 May 2007 02:39:39] Memb_handle_token: handling form2 token
[Fri 11 May 2007 02:39:39] Handle_form2 in FORM
[Fri 11 May 2007 02:39:39] Memb_handle_token: handling form2 token
[Fri 11 May 2007 02:39:39] Handle_form2 in EVS
[Fri 11 May 2007 02:39:39] Memb_transitional
[Fri 11 May 2007 02:39:39] Memb_regular
(on segment member)
Fri 11 May 2007 02:39:27] Memb_token_loss: I lost my token, state is 1
[Fri 11 May 2007 02:39:27] Scast_alive: State is 2
[Fri 11 May 2007 02:39:28] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:28] Handle_alive in SEG
[Fri 11 May 2007 02:39:28] Scast_alive: State is 2
[Fri 11 May 2007 02:39:32] Scast_alive: State is 2
[Fri 11 May 2007 02:39:33] Scast_alive: State is 2
[Fri 11 May 2007 02:39:34] Send_join: State is 4
[Fri 11 May 2007 02:39:34] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:34] Handle_alive in GATHER
[Fri 11 May 2007 02:39:35] Send_join: State is 4
[Fri 11 May 2007 02:39:35] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:35] Handle_alive in GATHER
[Fri 11 May 2007 02:39:36] Send_join: State is 4
[Fri 11 May 2007 02:39:36] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:36] Handle_alive in GATHER
[Fri 11 May 2007 02:39:37] Send_join: State is 4
[Fri 11 May 2007 02:39:37] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:37] Handle_alive in GATHER
[Fri 11 May 2007 02:39:38] Send_join: State is 4
[Fri 11 May 2007 02:39:38] Memb_handle_message: handling alive message
[Fri 11 May 2007 02:39:38] Handle_alive in GATHER
[Fri 11 May 2007 02:39:39] Memb_handle_token: handling form2 token
[Fri 11 May 2007 02:39:39] Handle_form2 in FORM
[Fri 11 May 2007 02:39:39] Memb_handle_token: handling form2 token
[Fri 11 May 2007 02:39:39] Handle_form2 in EVS
[Fri 11 May 2007 02:39:39] Memb_transitional
[Fri 11 May 2007 02:39:39] Memb_regular
Sometimes when Case 2 happens, the segment comes back as 1 member
(which causes wackamole to initiate the rebalance process) and then
adds the second member within seconds (which causes wackamole to
rebalance again). Most of the time, however, after the token loss is
initiated on both machines, they both come back as a part of the same
spread segment and there is no visible effect on wackamole or our
application(s).
We are using spread 3.17.03 from the sarge AMD64 repo across all
servers with the default timeouts in membership.c
Any ideas on what is causing these events at these seemingly regular
5 minute intervals would be most helpful.
Thanks,
Barry
More information about the Spread-users
mailing list