[Spread-users] daemon crash

John Schultz jschultz at spreadconcepts.com
Sun Jul 17 18:00:44 EDT 2011


First, it looks like one or more of your daemons was connecting and disconnecting repeatedly.

Second, it looks like you tripped the infinite-EVS state bug that we tried to work around.  We still aren't sure what causes this and most times simply restarting the protocol seems to fix it, which is the work around.

The state at the end looks like a bug.  It has the ARU as 167 but the highest seq as 135.  Then in the next membership it does establish it tries to issue packet #136, but that is lower than the ARU, so that packet id already exists.  I'm not 100% sure on this because during memberships the token fields can mean different things than during regular operation.

The infinite-EVS bug and this exit might be related or caused by the same logic issue.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Jul 17, 2011, at 2:38 PM, Matt Garman wrote:


Hello,

We're using spread 4.0.0 on 64-bit CentOS 4 (Linux).

The other day a daemon crashed.  Below is what was logged just prior
to the crash.  I was wondering if anyone could help shed some light
on this?

Thanks,
Matt


Membership id is ( -1407973572, 1308835744)
[Thu 23 Jun 2011 08:29:03] --------------------
[Thu 23 Jun 2011 08:29:03] Configuration at lnxsvr1 is:
[Thu 23 Jun 2011 08:29:03] Num Segments 1
[Thu 23 Jun 2011 08:29:03]      4       172.20.7.63       4803
[Thu 23 Jun 2011 08:29:03]              lnxsvr1                 172.20.7.60
[Thu 23 Jun 2011 08:29:03]              lnxsvr2                 172.20.7.61
[Thu 23 Jun 2011 08:29:03]              lnxsvr6                 172.20.7.62
[Thu 23 Jun 2011 08:29:03]              lnxsvr5                 172.20.7.58
[Thu 23 Jun 2011 08:29:03] =====================
Membership id is ( -1407973572, 1308835795)
[Thu 23 Jun 2011 08:29:54] --------------------
[Thu 23 Jun 2011 08:29:54] Configuration at lnxsvr1 is:
[Thu 23 Jun 2011 08:29:54] Num Segments 1
[Thu 23 Jun 2011 08:29:54]      4       172.20.7.63       4803
[Thu 23 Jun 2011 08:29:54]              lnxsvr1                 172.20.7.60
[Thu 23 Jun 2011 08:29:54]              lnxsvr2                 172.20.7.61
[Thu 23 Jun 2011 08:29:54]              lnxsvr6                 172.20.7.62
[Thu 23 Jun 2011 08:29:54]              lnxsvr5                 172.20.7.58
[Thu 23 Jun 2011 08:29:54] =====================
Membership id is ( -1407973572, 1308835801)
[Thu 23 Jun 2011 08:30:00] --------------------
[Thu 23 Jun 2011 08:30:00] Configuration at lnxsvr1 is:
[Thu 23 Jun 2011 08:30:00] Num Segments 1
[Thu 23 Jun 2011 08:30:00]      4       172.20.7.63       4803
[Thu 23 Jun 2011 08:30:00]              lnxsvr1                 172.20.7.60
[Thu 23 Jun 2011 08:30:00]              lnxsvr2                 172.20.7.61
[Thu 23 Jun 2011 08:30:00]              lnxsvr6                 172.20.7.62
[Thu 23 Jun 2011 08:30:00]              lnxsvr5                 172.20.7.58
[Thu 23 Jun 2011 08:30:00] =====================
[Thu 23 Jun 2011 08:30:01] Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
[Thu 23 Jun 2011 08:30:01]      Aru:              167
[Thu 23 Jun 2011 08:30:01]      My_aru:           167
[Thu 23 Jun 2011 08:30:01]      Highest_seq:      135
[Thu 23 Jun 2011 08:30:01]      Highest_fifo_seq: 84
[Thu 23 Jun 2011 08:30:01]      Last_discarded:   0
[Thu 23 Jun 2011 08:30:01]      Last_delivered:   167
[Thu 23 Jun 2011 08:30:01]      Last_seq:         3468
[Thu 23 Jun 2011 08:30:01]      Token_rounds:     501
[Thu 23 Jun 2011 08:30:01] Last Token:
[Thu 23 Jun 2011 08:30:01]      type:             0x80040080
[Thu 23 Jun 2011 08:30:01]      transmiter_id:    -1407973572
[Thu 23 Jun 2011 08:30:01]      seq:              0
[Thu 23 Jun 2011 08:30:01]      proc_id:          -1407973572
[Thu 23 Jun 2011 08:30:01]      aru:              167
[Thu 23 Jun 2011 08:30:01]      aru_last_id:      -1407973572
[Thu 23 Jun 2011 08:30:01]      flow_control:     0
[Thu 23 Jun 2011 08:30:01]      rtr_len:          0
[Thu 23 Jun 2011 08:30:01]      conf_hash:        1007608523
Membership id is ( -1407973572, 1308835805)
[Thu 23 Jun 2011 08:30:01] --------------------
[Thu 23 Jun 2011 08:30:01] Configuration at lnxsvr1 is:
[Thu 23 Jun 2011 08:30:01] Num Segments 1
[Thu 23 Jun 2011 08:30:01]      4       172.20.7.63       4803
[Thu 23 Jun 2011 08:30:01]              lnxsvr1                 172.20.7.60
[Thu 23 Jun 2011 08:30:01]              lnxsvr2                 172.20.7.61
[Thu 23 Jun 2011 08:30:01]              lnxsvr6                 172.20.7.62
[Thu 23 Jun 2011 08:30:01]              lnxsvr5                 172.20.7.58
[Thu 23 Jun 2011 08:30:01] =====================
[Thu 23 Jun 2011 08:30:01] Send_new_packets: created packet 136 already exist 2
Exit caused by Alarm(EXIT)


_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20110717/68f0a4de/attachment.bin 


More information about the Spread-users mailing list