[Spread-users] Error

Tim Peters tim at zope.com
Sat Jan 19 06:41:32 EST 2002


[Jonathan Stanton]
> The problem I remember did have to do with thread behavior when a
> disconnect occured on a socket. The problem is that there are races
> when one thread gets a socket error and closes a socket (in the
> libsp code) and other threads are also trying to use that socket. I
> think it actually only happened when the socket was immediately
> reconnected and the socket number (fd) got reused. We know how to
> fix it, and I just don't recall if we have already integrated the
> fix or not.

I'm pretty sure we've seen this happen under 3.16.1, so I don't think a fix
has been released yet.

[Guido van Rossum]
> I find it kind of strange that Spread closes the socket file
> descriptor; it would have been safer for the user if it just marked
> that mbox as "bad" without actually closing it (the reason being the
> file descriptor reuse case you describe).  I had to put a bandaid
> around this problem in the Python wrapper (this bandaid isn't on the
> distribution on the web yet).

"The bandaid" is to set our own mbox wrapper object's disconnected flag to
true upon seeing CONNECTION_CLOSED or ILLEGAL_SESSION come back from Spread,
right?  Alas, that doesn't really solve it, just makes it more unlikely:
because we release the global interpreter lock around the Spread API calls,
this (for example) is possible:

Thread A				   Thread B
call Python mbox.receive()
passes self->disconnected check
releases GIL
                                 calls Python mbox.multicast()
                                 passes self->disconnected check
calls Spread SP_receive()
					   releases GIL
reacquires GIL
sees CONNECTION_CLOSED
sets self->disconnected
           *** an arbitrarily long time can pass here ***
                                 calls Spread SP_multicast, now with
                                     a recycled mbox descriptor

That's what I was groping at when I said earlier we'd have to put out
Python-level mbox calls under protection of a mutex.  Alternatively, and
until this problem is fixed in Spread, it would be better if we stopped
releasing the GIL in our wrapper (then the sequence of checking
self->disconnected, making a Spread call, and possibly setting
self->disconnected upon error, would be indivisible).






More information about the Spread-users mailing list