[Spread-users] Question about thread-safety
Yair Amir
yairamir at cnds.jhu.edu
Fri Jun 13 16:40:40 EDT 2003
Hi,
The user.c is one such example as far as I remember. It can be
compiled to create sptuser which is demonstrating a receiving thread
(as opposed to the event-driven style of regular spuser).
The source is the same user.c source but look at what code is
generated when _REENTRANT is defined.
Cheers,
:) Yair.
On Friday, June 13, 2003 3:20 PM
White stwhit Stuart.White at acxiom.com wrote:
White> Hi John,
White> Thanks for your comments. I am #defining _REENTRANT and linking
White> libtspread.so. I am not closing/opening multiple connections, so I don't
White> think I'm running into your file descriptor-reuse issue.
White> I decided to try locking a mutex around all SP_* calls to try to resolve the
White> problem, but it occurs to be that I cannot, because calls to SP_receive()
White> will block until a message is available. If I lock a mutex around all SP_*
White> calls, you can see that this could easily create a deadlock situation. A
White> call to SP_receive will block (with the mutex locked) until a message
White> becomes available, but none ever will, because the "sender" thread attempts
White> to lock the same mutex (and blocks) before he calls SP_multicast().
White> Perhaps I'm experiencing one of the non-user errors you mentioned, such as
White> socket failure. Bummer.
White> Are there any resources/example programs which demonstrate correct spread
White> usage in threaded C applications?
White> Thanks!
White> -----Original Message-----
White> From: John Schultz [mailto:jschultz at d-fusion.net]
White> Sent: Friday, June 13, 2003 1:31 PM
White> To: spread-users at lists.spread.org
White> Subject: Re: [Spread-users] Question about thread-safety
White> NOTE: if anyone knows how to instruct Unix/Linux systems not to reuse
White> file descriptor IDs in a process, could you please email me or the list
White> a good reference (with page #)? Thanks!
White> Hi Stuart,
White> What you are proposing is sound provided that you #define _REENTRANT
White> when compiling and link with Spread's thread safe library libtspread.a.
White> Currently, you can get spurious errors from Spread for the following
White> reason: whenever there is a non-user error (socket failure, etc.) on a
White> mailbox/socket, the Spread user library immediately closes and
White> invalidates the mailbox/socket and returns CONNECTION_CLOSED.
White> Any subsequent SP call's on that mailbox/socket will return
White> ILLEGAL_SESSION. So, if your sender thread gets a CONNECTION_CLOSED
White> your recv'er thread would very likely get an ILLEGAL_SESSION (or maybe a
White> CONNECTION_CLOSED), and vice versa. Just treat any such ILLEGAL_SESSION
White> error as if it were a CONNECTION_CLOSED error.
White> Personally, I think that the Spread library should be modified to record
White> any such error and return it for all subsequent SP calls on that
White> mailbox. Furthermore, the mailbox/socket should be invalidated/closed
White> only upon the user calling SP_disconnect on it.
White> If your program is opening and closing multiple Spread connections then
White> there is also a OS file descriptor reuse race condition that could be
White> causing problems. This race condition is best explained by example:
White> Imagine you have a sender thread (x) and a receiver thread (y) for
White> mailbox/socket A and another thread (z) which is going to call
White> SP_connect to create a mailbox/socket B. Just before x starts writing a
White> msg on A, y receives an error on A and therefore immediately
White> closes/invalidates it. Next, z successfully performs SP_connect and is
White> assigned mailbox/socket B, which happens to have the same value as A due
White> to the OS reusing file descriptor IDs. Finally, y happily (and
White> successfully) writes its msg for A on B not realizing that it is
White> actually writing to a different Spread connection!
White> This behavior is obviously not correct! I'm not sure if this race
White> condition exists on Windows but it definitely exists in Unix/Linux. I
White> don't know if this problem can be reliably handled on the daemon side
White> and I doubt if currently the daemon even tries to detect it.
White> The only way I can think of to avoid this race condition with the
White> current Spread library is to instruct your OS not to reuse file
White> descriptors IDs (see NOTE above).
White> If the Spread library is modified as I suggested above, then I think the
White> race condition could be avoided by synchronizing calls to SP_connect and
White> SP_disconnect.
More information about the Spread-users
mailing list