[Spread-users] Total freeze of all daemons after long running
Nico Meyer
nmeyer at virtualminds.de
Thu Mar 13 15:38:30 EDT 2008
Hi,
I reported the exact same problem two months ago, but never got any answer.
Please see
http://commedia.cnds.jhu.edu/pipermail/spread-users/2008-January/003653.html
for the original post.
Today it happend again (excactly 2 months later, but this is most likely a
coincidence). The logs show also the same numbers (compare with my original
post):
[Thu 13 Mar 2008 15:04:28] Prot_handle_token: BUG WORKAROUND: Too many rounds
in EVS state; swallowing token; state:
[Thu 13 Mar 2008 15:04:28] Aru: -2147481909
[Thu 13 Mar 2008 15:04:28] My_aru: -2147481909
[Thu 13 Mar 2008 15:04:28] Highest_seq: 2147482054
[Thu 13 Mar 2008 15:04:28] Highest_fifo_seq: 21344
[Thu 13 Mar 2008 15:04:28] Last_discarded: 2147482054
[Thu 13 Mar 2008 15:04:28] Last_delivered: 2147482054
[Thu 13 Mar 2008 15:04:28] Last_seq: -2147481909
[Thu 13 Mar 2008 15:04:28] Token_rounds: 501
[Thu 13 Mar 2008 15:04:28] Last Token:
[Thu 13 Mar 2008 15:04:28] type: 0x80050080
[Thu 13 Mar 2008 15:04:28] transmiter_id: -1062731508
[Thu 13 Mar 2008 15:04:28] seq: 0
[Thu 13 Mar 2008 15:04:28] proc_id: -1062731508
[Thu 13 Mar 2008 15:04:28] aru: -2147481909
[Thu 13 Mar 2008 15:04:28] aru_last_id: 0
[Thu 13 Mar 2008 15:04:28] flow_control: 0
[Thu 13 Mar 2008 15:04:28] rtr_len: 1440
[Thu 13 Mar 2008 15:04:28] conf_hash: -2002019299
repeated every few seconds.
and a little later:
[Thu 13 Mar 2008 15:06:19] Prot_handle_token: BUG WORKAROUND: Too many rounds
in EVS state; swallowing token; state:
[Thu 13 Mar 2008 15:06:19] Aru: 3333
[Thu 13 Mar 2008 15:06:19] My_aru: 3333
[Thu 13 Mar 2008 15:06:19] Highest_seq: 2147482054
[Thu 13 Mar 2008 15:06:19] Highest_fifo_seq: 21344
[Thu 13 Mar 2008 15:06:19] Last_discarded: 2147482054
[Thu 13 Mar 2008 15:06:19] Last_delivered: 2147482054
[Thu 13 Mar 2008 15:06:19] Last_seq: 3333
[Thu 13 Mar 2008 15:06:19] Token_rounds: 501
[Thu 13 Mar 2008 15:06:19] Last Token:
[Thu 13 Mar 2008 15:06:19] type: 0x80050080
[Thu 13 Mar 2008 15:06:19] transmiter_id: -1062731508
[Thu 13 Mar 2008 15:06:19] seq: 1
[Thu 13 Mar 2008 15:06:19] proc_id: -1062731508
[Thu 13 Mar 2008 15:06:19] aru: 3333
[Thu 13 Mar 2008 15:06:19] aru_last_id: -1062731505
[Thu 13 Mar 2008 15:06:19] flow_control: 0
[Thu 13 Mar 2008 15:06:19] rtr_len: 1440
[Thu 13 Mar 2008 15:06:19] conf_hash: -2002019299
also repeatedly.
and on another server in the same spread segment:
[Thu 13 Mar 2008 15:04:28] Prot_handle_token: BUG WORKAROUND: Too many rounds
in EVS state; swallowing token; state:
[Thu 13 Mar 2008 15:04:28] Aru: -2147481909
[Thu 13 Mar 2008 15:04:28] My_aru: -2147481909
[Thu 13 Mar 2008 15:04:28] Highest_seq: 2147482054
[Thu 13 Mar 2008 15:04:28] Highest_fifo_seq: 746858665
[Thu 13 Mar 2008 15:04:28] Last_discarded: 2147482054
[Thu 13 Mar 2008 15:04:28] Last_delivered: 2147482054
[Thu 13 Mar 2008 15:04:28] Last_seq: -2147481909
[Thu 13 Mar 2008 15:04:28] Token_rounds: 501
[Thu 13 Mar 2008 15:04:28] Last Token:
[Thu 13 Mar 2008 15:04:28] type: 0x80050080
[Thu 13 Mar 2008 15:04:28] transmiter_id: -1062731497
[Thu 13 Mar 2008 15:04:28] seq: 0
[Thu 13 Mar 2008 15:04:28] proc_id: -1062731497
[Thu 13 Mar 2008 15:04:28] aru: -2147481909
[Thu 13 Mar 2008 15:04:28] aru_last_id: 0
[Thu 13 Mar 2008 15:04:28] flow_control: 0
[Thu 13 Mar 2008 15:04:28] rtr_len: 1440
[Thu 13 Mar 2008 15:04:28] conf_hash: -2002019299
after the last incident I raised the timeout in membership.c a little, so my
assumption that the low timeouts are the culprit seems to be wrong.
Please, have a look this time, as this is a really serious problem.
My workaround for now will be putting an exit(0) in the code where the message
is generated, and restart the daemons after 5s. Hopefully this will at least
recover the spread ring.
Let me know, if you need any aditional info.
Bye,
Nico
More information about the Spread-users
mailing list