[Spread-users] Send_new_packets: created packet 203 already exist 2

John Schultz jschultz at spreadconcepts.com
Wed Feb 22 14:04:28 EST 2012


Hi Matt,

These kinds of issues have been lingering for some time now as you know.  

We suspect that the internal state of the daemons is somehow being corrupted or getting to an illegal state through a bug of some sort.  This issue eventually manifests itself when the daemon(s) later detect some invariant being violated and commits suicide.  It shows up in different places for different people at different times but the underlying cause is likely the same for many of these kinds of reports.  It seems to occur more often for people that have "flaky" (e.g. - higher loss than normal, asymmetric comms., etc.) networks and high message rates.

Unfortunately, using something like valgrind to try to catch invalid memory accesses is unlikely to help because the overhead is too high and ends up affecting the performance too much away from native running behavior (i.e. - Heisenbug).  We believe the only way we will ultimately squash this bug(s) is to add more internal validity checking, internal state dumps and/or code reviews to try to spot them.

We are currently wrapping up a release candidate for the next version of Spread (4.2).  After that is release is officially out, we intend to turn our attention to this issue to try to nail down what is happening here.

Since you are regularly running into this issue, and it has proven very hard for us to cause in our test environment, it might be helpful if we could deploy test versions into your environment.  Would you be open to that kind of arrangement?

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Feb 22, 2012, at 1:40 PM, Matt Garman wrote:

Hi,

I asked about this back in May, 2008 [1], but never really came to any
resolution.

As a refresher, we're getting regular spread daemon crashes (it went
away for a while, but has recently become a very regular occurrence,
as in several times/day).  We're using spread version 4.00.00,
self-compiled on CentOS 5.6.

The log leading up to the crash looks like this:

[Wed 22 Feb 2012 12:04:58] Prot_handle_token: BUG WORKAROUND: Too many
rounds in EVS state; swallowing token; state:
[Wed 22 Feb 2012 12:04:58]      Aru:              241
[Wed 22 Feb 2012 12:04:58]      My_aru:           241
[Wed 22 Feb 2012 12:04:58]      Highest_seq:      200
[Wed 22 Feb 2012 12:04:58]      Highest_fifo_seq: 103
[Wed 22 Feb 2012 12:04:58]      Last_discarded:   0
[Wed 22 Feb 2012 12:04:58]      Last_delivered:   241
[Wed 22 Feb 2012 12:04:58]      Last_seq:         3533
[Wed 22 Feb 2012 12:04:58]      Token_rounds:     501
[Wed 22 Feb 2012 12:04:58] Last Token:
[Wed 22 Feb 2012 12:04:58]      type:             0x80040080
[Wed 22 Feb 2012 12:04:58]      transmiter_id:    -1407973572
[Wed 22 Feb 2012 12:04:58]      seq:              0
[Wed 22 Feb 2012 12:04:58]      proc_id:          -1407973572
[Wed 22 Feb 2012 12:04:58]      aru:              241
[Wed 22 Feb 2012 12:04:58]      aru_last_id:      -1407973572
[Wed 22 Feb 2012 12:04:58]      flow_control:     0
[Wed 22 Feb 2012 12:04:58]      rtr_len:          0
[Wed 22 Feb 2012 12:04:58]      conf_hash:        1007608523
Membership id is ( -1407973572, 1329934005)
[Wed 22 Feb 2012 12:04:58] --------------------
[Wed 22 Feb 2012 12:04:58] Configuration at lnxsvr1 is:
[Wed 22 Feb 2012 12:04:58] Num Segments 1
[Wed 22 Feb 2012 12:04:58]      4       172.20.7.63       4803
[Wed 22 Feb 2012 12:04:58]              lnxsvr1                 172.20.7.60
[Wed 22 Feb 2012 12:04:58]              lnxsvr2                 172.20.7.61
[Wed 22 Feb 2012 12:04:58]              lnxsvr6                 172.20.7.62
[Wed 22 Feb 2012 12:04:58]              lnxsvr5                 172.20.7.58
[Wed 22 Feb 2012 12:04:58] ====================
[Wed 22 Feb 2012 12:04:58] Send_new_packets: created packet 203 already exist 2
Exit caused by Alarm(EXIT)

Any thoughts?

Thanks,
Matt


[1] http://lists.spread.org/pipermail/spread-users/2008-May/003824.html

_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20120222/db1f8f9c/attachment.bin 


More information about the Spread-users mailing list