[Spread-users] Leader crashing often

John Schultz jschultz at spreadconcepts.com
Fri Nov 2 14:43:27 EDT 2007


Ed,

I think you can truncate a too big membership messages w/o harming the 
daemon's proper functioning.

However, your applications are certainly not going to get the full 
membership info and they need to be resilient to that.  In addition, you 
will probably need to examine the client-daemon protocol for transmitting 
membership messages and make sure that your truncation matches whatever 
header calculations the client does upon receipt.

For example, I think that if you simply stop writing to a membership 
message when you are about to exceed its size, then that won't work. If 
you did that, then I imagine the client-daemon protocol would get out of 
sync because the client would be expecting more data from the daemon 
(according to its header calculations) than the daemon will ever send.

So you also need to figure out what header fields need to be modified so 
that the client-daemon protocol doesn't get thrown out of sync.

Cheers!

---
John Schultz
Spread Concepts
Phn: 443 838 2200

On Fri, 2 Nov 2007, Ed Holyat wrote:

> John in my case, I have my MAX_GROUP_NAME==255 which is a requirement
> for our app.
> All daemons are at this size.  Everything works fine until this buffer
> is blown, which is < 200 members.  John, I understand why it is
> happening, I just would like to know if the truncation is ok.
>
>
>
> -----Original Message-----
> From: spread-users-bounces at lists.spread.org
> [mailto:spread-users-bounces at lists.spread.org] On Behalf Of John Schultz
> Sent: Friday, November 02, 2007 11:54 AM
> To: spread-users at lists.spread.org
> Subject: RE: [Spread-users] Leader crashing often
>
> I believe Ed is correct.  This problem can occur when the daemon tries
> to
> construct a message that is too large to fit in a single message.  This
> can occur if the number of members in a group is too large to fit in a
> message -- but, by default (MAX_GROUP_SIZE=32), this should take
> something
> like having a group with at least 1500 members in it (worst case).
>
> The only other way I can think this could happen is if you changed the
> default MAX_GROUP_SIZE, as Ed pointed out, or if the client and daemon
> were compiled and have different ideas of what MAX_MESSAGE_SIZE are.
> Normally, clients shouldn't be able to inject any messages that would
> cause the daemon to run out of space in allocating a user message.
>
> So, if you have messages that you send to lots and lots of private
> groups,
> this could be the problem if the client and daemon have different
> conceptions of MAX_GROUP_SIZE.
>
> Cheers!
> John
>
> ---
> John Schultz
> Spread Concepts
> Phn: 443 838 2200
>
> On Fri, 2 Nov 2007, Ed Holyat wrote:
>
>> It is not necessarily the number of members in the group.  The crash
> is
>> related to the number of members * the size of the private names
>> associated with that group.  This can not exceed the vector space that
>> sends this information out which will hold maximum
>>
>> 100 ( num_elements ) * MAX_PACKET_SIZE
>>
>> -----Original Message-----
>> From: Nico Meyer [mailto:nmeyer at virtualminds.de]
>> Sent: Friday, November 02, 2007 10:31 AM
>> To: Ed Holyat
>> Cc: spread-users at lists.spread.org
>> Subject: Re: [Spread-users] Leader crashing often
>>
>> But does this relate to the total number of members summed over all
>> groups, or
>> only the number of groups a certain message is directed to? Because we
>> might
>> have a high number of clients which are in no group (aside from their
>> private
>> group of course), but we certainly have no group with 100 members.
>>
>>
>> On Friday 02 November 2007 15:13:30 Ed Holyat wrote:
>>> This seems to be related to the issue I am also experiencing.
>>>
>>> See subject RE: [Spread-users] To protect against SEGV in scatter on
>> the
>>> mailing list.
>>> I attached a fix, the fix truncates the member list when the number
> of
>>> elements is exceeded.
>>> I am waiting to find out what the side effects of truncating the
>> member
>>> list is.  The correct fix may be to send two messages so the member
>> list
>>> is not truncated.
>>>
>>> -----Original Message-----
>>> From: spread-users-bounces at lists.spread.org
>>> [mailto:spread-users-bounces at lists.spread.org] On Behalf Of Nico
> Meyer
>>> Sent: Friday, November 02, 2007 4:45 AM
>>> To: spread-users at lists.spread.org
>>> Subject: [Spread-users] Leader crashing often
>>>
>>> Hello,
>>>
>>> lately one of our spread daemons started to crash very often with the
>>> following log message:
>>>
>>> spread: message.c:398: Message_add_scat_element: Assertion
>>> `msg->num_elements
>>> 100' failed.
>>>
>>> The only thing special about this one daemon is that it normally is
>> the
>>> leader.
>>> Does anbody know, what can cause this?
>>>
>>> Thank you,
>>>
>>> Nico
>>>
>>> _______________________________________________
>>> Spread-users mailing list
>>> Spread-users at lists.spread.org
>>> http://lists.spread.org/mailman/listinfo/spread-users
>>
>>
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>




More information about the Spread-users mailing list