[Spread-users] Spread python module

Thu Apr 25 12:13:54 EDT 2002

[Jonathan Stanton]
> I agree with this statement of the problem and we will be adding the new
> disconnect semantics soon.

Thank you!

> I just want to make two comments about the threaded client discusssion.
>
> Is the Python module is linking with libsp or libtsp (the regular or
> threadsafe Spread C-library) I'm guessing it is the regular libsp
> since you do your own locking.

It's libtsp, and we're not trying to serialize Spread calls.  We're trying
to make this sequence indivisible:

1    If this mbox hasn't already been disconnected:
2        Call a Spread operation on this mbox.
3        If that returns a "and Spread closed the socket" code:
4            Mark the mbox as being disconnected.

If that's not indivisible, thread A can pass the #1 "disconnected?" check,
thread B can sneak in and set the disconnected flag, and then thread A can
call the Spread operation despite that the mbox got disconnected (between
lines 1 and 2).

We make that indivisible with a per-mbox lock around the sequence.  So it
doesn't serialize calls to Spread in general, but it does have the effect of
serializing Spread calls with respect to a given mbox.  The per-mbox Spread
serialization wasn't a goal, just an unhappy consequence of making the
checkflag-work-setflag sequence indivisible.

> I can understand just removing the lock which causes the deadlock, but
> because the lock removal can cause 'unexplainable' behavior with
> connections being reused inappropriately, it seems like a documeted
> deadlock is safer (since there is a workaround for the deadlock by using
> select) Then when the spread library is fixed, a new release of the
> Python module could relax the lock to regain the lost concurrency and
> negate teh deadlock possibility. It is certainly your call that just
> seemed safer to me.

The select isn't really a cure, without piling more hard-to-understand
restrictions on usage.  For example, if our apps had more than one thread in
a process reading from a single mbox (they don't), they could all see a
"ready!" indication from select and so all try to receive.  The first one in
would succeed, and the rest could be left hanging forever.

If we get rid of the per-mbox locking, Guido estimates the odds of seeing a
problem as too low to worry about.  I'm not so sure, and I remember that you
pointed out this potential problem to us before we knew about it -- I
figured you saw it happen in real life.

Our own apps should again be immune, because we kill the whole process when
an mbox disconnects (and another watcher processes restarts us then).  But
*introducing* the per-mbox locks didn't hurt our apps either.  So I don't
take much comfort from that our own apps won't mind losing the lock; we
already know other apps using our Spread wrapper aren't so defensively
coded.

BTW, I'd rather you spent your time changing Spread's disconnection
semantics than replying to this <wink>.