[Spread-users] Error

Jonathan Stanton jonathan at cnds.jhu.edu
Fri Jan 18 17:04:37 EST 2002

On Fri, Jan 18, 2002 at 09:58:56AM -0500, Guido van Rossum wrote:
> > Still, Guido saw some other failures that didn't fit this scenario, like
> > "Sess_validate_read_header: Message has negative or too large num_groups
> > field" (btw, the Alarm that displays this is missing a %d in its format
> > string, so it doesn't show the offending num_groups value also passed to the
> > Alarm call).
> Could that one have been caused by the disconnects too?  This part of
> Spread's session.c code is a bit too hairy for me to follow (and my
> motivation is low now that we've nailed our problem :-), but couldn't
> a short read have caused this?

This is possible, it is not quite a short read on the daemon side, but
I have seen this occur when (because of a bug) a client multicast did not
complete and then a second mcast was done on the same spread connection and
thus the message that the daemon received looked some thing like this:

sender:header 1......header2......body 2
recv  :header1..............body1

where header1 was incomplete, so to the receiver headers 1 and 2 got mixed

Now obviously I fixed the bug that caused this before, but I can imagine
there might be another way to cause it.

> > We'd love to hear anything about possible problems with
> > multithreaded apps regardless.
> Indeed.  (The only one we know about is that disconnects in one thread
> can cause arbitrary trouble for other threads.)

The problem I remember did have to do with thread behavior when a
disconnect occured on a socket. The problem is that there are races when
one thread gets a socket error and closes a socket (in the libsp code) and
other threads are also trying to use that socket. I think it actually only
happened when the socket was immediately reconnected and the socket number
(fd) got reused. We know how to fix it, and I just don't recall if we have
already integrated the fix or not. 

I'll be back in baltimore monday morning and I'll check out the details
then. But the bottom line is if you make sure the disconenected connection
is not reused by other threads then you should be ok.


Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    

More information about the Spread-users mailing list