[Spread-users] Question about thread-safety

Yair Amir yairamir at cnds.jhu.edu
Fri Jun 13 16:40:40 EDT 2003


Hi,

The user.c is one such example as far as I remember. It can be
compiled to create sptuser which is demonstrating a receiving thread
(as opposed to the event-driven style of regular spuser).

The source is the same user.c source but look at what code is
generated when _REENTRANT is defined.

Cheers,

       :) Yair.
       
On Friday, June 13, 2003 3:20 PM
White stwhit Stuart.White at acxiom.com wrote:

White> Hi John,

White> Thanks for your comments.  I am #defining _REENTRANT and linking
White> libtspread.so.  I am not closing/opening multiple connections, so I don't
White> think I'm running into your file descriptor-reuse issue.

White> I decided to try locking a mutex around all SP_* calls to try to resolve the
White> problem, but it occurs to be that I cannot, because calls to SP_receive()
White> will block until a message is available.  If I lock a mutex around all SP_*
White> calls, you can see that this could easily create a deadlock situation.  A
White> call to SP_receive will block (with the mutex locked) until a message
White> becomes available, but none ever will, because the "sender" thread attempts
White> to lock the same mutex (and blocks) before he calls SP_multicast().

White> Perhaps I'm experiencing one of the non-user errors you mentioned, such as
White> socket failure.  Bummer.

White> Are there any resources/example programs which demonstrate correct spread
White> usage in threaded C applications?

White> Thanks!

White> -----Original Message-----
White> From: John Schultz [mailto:jschultz at d-fusion.net]
White> Sent: Friday, June 13, 2003 1:31 PM
White> To: spread-users at lists.spread.org
White> Subject: Re: [Spread-users] Question about thread-safety


White> NOTE: if anyone knows how to instruct Unix/Linux systems not to reuse 
White> file descriptor IDs in a process, could you please email me or the list 
White> a good reference (with page #)? Thanks!

White> Hi Stuart,

White> What you are proposing is sound provided that you #define _REENTRANT 
White> when compiling and link with Spread's thread safe library libtspread.a.

White> Currently, you can get spurious errors from Spread for the following 
White> reason: whenever there is a non-user error (socket failure, etc.) on a 
White> mailbox/socket, the Spread user library immediately closes and 
White> invalidates the mailbox/socket and returns CONNECTION_CLOSED.

White> Any subsequent SP call's on that mailbox/socket will return 
White> ILLEGAL_SESSION.  So, if your sender thread gets a CONNECTION_CLOSED 
White> your recv'er thread would very likely get an ILLEGAL_SESSION (or maybe a 
White> CONNECTION_CLOSED), and vice versa.  Just treat any such ILLEGAL_SESSION 
White> error as if it were a CONNECTION_CLOSED error.

White> Personally, I think that the Spread library should be modified to record 
White> any such error and return it for all subsequent SP calls on that 
White> mailbox.  Furthermore, the mailbox/socket should be invalidated/closed 
White> only upon the user calling SP_disconnect on it.

White> If your program is opening and closing multiple Spread connections then 
White> there is also a OS file descriptor reuse race condition that could be 
White> causing problems.  This race condition is best explained by example:

White> Imagine you have a sender thread (x) and a receiver thread (y) for 
White> mailbox/socket A and another thread (z) which is going to call 
White> SP_connect to create a mailbox/socket B.  Just before x starts writing a 
White> msg on A, y receives an error on A and therefore immediately 
White> closes/invalidates it. Next, z successfully performs SP_connect and is 
White> assigned mailbox/socket B, which happens to have the same value as A due 
White> to the OS reusing file descriptor IDs. Finally, y happily (and 
White> successfully) writes its msg for A on B not realizing that it is 
White> actually writing to a different Spread connection!

White> This behavior is obviously not correct! I'm not sure if this race 
White> condition exists on Windows but it definitely exists in Unix/Linux. I 
White> don't know if this problem can be reliably handled on the daemon side 
White> and I doubt if currently the daemon even tries to detect it.

White> The only way I can think of to avoid this race condition with the 
White> current Spread library is to instruct your OS not to reuse file 
White> descriptors IDs (see NOTE above).

White> If the Spread library is modified as I suggested above, then I think the 
White> race condition could be avoided by synchronizing calls to SP_connect and 
White> SP_disconnect.





More information about the Spread-users mailing list