[Spread-users] BUFFER_TOO_SHORT && endian_mismatch >= 0

John Schultz jschultz at d-fusion.net
Wed Jul 3 16:04:24 EDT 2002

Just a quick shot in the dark:

Are you sure that you are using ints instead of unsigneds as the return 
parameters? Your compiler should warn you if you are, but this could 
explain why endian comes back as non-negative when it should be negative.

It looks like your num_groups also gets messed up even on successful 
calls to receive sometimes?

What kind of values are you getting back for the different parameters 
when things fail (I know you said you don't know, but maybe you could 
add in some more diagnostic code)? Are they completely off the wall or 
could they be correct but of the wrong sign or zero or what?

If you are having several parameters come back messed up on different 
occasions I would guess that you might be getting stack or heap 
corruption. It looks like you are using multiple threads. Are you sure 
that different threads aren't trying to use the same memory variables 
(i.e. - are they using local stack variables or not)? Is it possible 
that a parallel memcpy or something could be overwriting that portion of 
your data?

Just some thoughts...

John Schultz
Co-Founder, Lead Engineer
D-Fusion, Inc. (http://www.d-fusion.net)
Phn: 443-838-2200 Fax: 707-885-1055

Tim Peters wrote:

> [Jonathan Stanton]
>>endian_mismatch can in general be negative, 0 or 1. The negative cases
>>only occur when the BUFFER_TOO_SHORT error also occurs. Otherwise
>>endian_mismatch should be 0 or 1. It will be 0 for membership messages
>>and when the endianness of the sender is the same as that of the
>>receiver. It will be 1 when the sending machine (where the client runs)
>>has a different endianness from the receiving machine (where the
>>receiving client runs).
>>So I'm not sure if that answers the question. If you are saying that
>>endian_mismatch is 0 when you also got a BUFFER_TOO_SHORT error then
>>that shouldn't occur I think.
> Us either <wink>.  We don't know what the endian_mismatch value is
> specifically, only that SP_receive() returned BUFFER_TOO_SHORT and
> endian_mismatch was not negative then.  The code is like this:
> 	for (;;) {
> 		size = SP_receive(self->mbox, &svc_type,
> 				  senderbuffer,
> 				  max_groups, &num_groups, groups,
> 				  &msg_type, &endian,
> 				  bufsize, pbuffer);
> 		if (size >= 0) {
> 			if (num_groups < 0) {
> 				/* XXX This really happens!
> 				   Don't dare retry the receive, since we
> 				   didn't get an error.  The extra names
> 				   are forever lost. */
> 				num_groups = max_groups;
> 			}
> 			break;
> 		}
> 		if (size == BUFFER_TOO_SHORT) {
> 			if (endian >= 0)
> 				goto set_error;      THIS BRANCH IS GETTING TAKEN
> 			bufsize = -endian;
> 			Py_XDECREF(data);
> 			data = PyString_FromStringAndSize(NULL, bufsize);
> 			if (data == NULL)
> 				goto error;
> 			pbuffer = PyString_AS_STRING(data);
> 			continue;
> 		}
> 		if (size == GROUPS_TOO_SHORT) {
> 			if (num_groups >= 0)
> 				goto set_error;
> 			max_groups = -num_groups;
> 			...
>>You can get a GROUPS_TOO_SHORT with an endian_mismatch of 0 though. That
>>just means that the groups buffer is too short but the main message body
>>buffer is big enough.
> I believe that.  I cut off our GROUPS_TOO_SHORT code above because it isn't
> relevant, but it doesn't look at endian_mismatch.
>>Does this help?
> A little, in confirming that what we're seeing isn't possible <wink>.  Our
> evidence comes from crash log files at a user's site; we haven't yet been
> able to provoke it ourselves.  Thanks for the confirmation!  More when we
> know more.
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users

More information about the Spread-users mailing list