[Spread-users] lost flooder messages

Yair Amir yairamir at cnds.jhu.edu
Tue May 20 16:29:25 EDT 2003


Hi Kelvin,

We finally got some time to check this out.
It seems that if a socket is closed by the sender before all the
messages sent have left the sending buffer, they may not be actually
sent.

When you run flooder which immediately disconnects upon sending the
last message, it is possible that the operating system discards the
last messages before Spread receives them. We verified that those last
messages are not received by Spread (using spmonitor). This occurred
at least on Linux and FreeBSD.

As John Schultz correctly noted before, this is not a problem in
Spread or its semantics as Spread did not get these messages.

If this behavior is not desired, the sending process for example,
could leave the group and wait to receive the leave notification before
it disconnects the socket (or exit the program). That will guarantee that all of the
messages were received (and processed) by Spread. Other ways to
overcome this treatment by the operating system are also possible
with Spread.

   Hope this helps,

   :) Yair and Ryan.

On Friday, May 09, 2003 7:24 PM
Kelvin Fedrick Kelvin.Fedrick at noaa.gov wrote:



Kelvin> Yair Amir wrote:

>> Ok, so a different client - not spflooder but a regular spuser does
>> not receive all the messages. This should not happen.
>> Do you let that spuser stay up for a while and it still does
>> not receive the messages?

Kelvin> I leave the spuser up afterward. I've also tried running the "receive a
Kelvin> message (stuck)" option afterward with no luck.

>>
>> (if a client does not keep up, spread is buffering up to 1000 messages
>> for it, but then will try to send to it only in a few second)
>> so just to make sure:
>>
>> 1. you send 10000 messages by your modified spflooder
>> 2. the spuser that joins the "flooder" group ahead of time
>>    only receives the first 9994 messages even if you leave it running
>>    for a while after spflooder exited.
>>
>>    Is this correct?

Kelvin> That's correct except we've been testing with fewer messages sent (i.e. spflooder -m 500).

>>
>>    Also, try to send a message to "flooder" group by spuser after
>>    the modified spflooder exited  and see if you get that message.

Kelvin> I've tried this as well. It succeeds at getting the new message. Furthermore,
Kelvin> if I rerun spflooder again without restarting the spuser, it receives the new
Kelvin> messages, but again usually with the last few missing.

>>
>>
>>    Also, I am not sure in which program you added a "sleep" that
>>    bypassed the problem.
>>

Kelvin> A sleep(1) was added to flooder.c in main() just after the printf("flooder completed multicast of
Kelvin> ...
Kelvin> call and before the 'return 0'.

>>
>>    Also, specify exactly how you connected with Spread (IPC or TCP)
>>    and wether locally or remotely for any client that connected to
>>    spread in this experiment. Also, how many daemons you have in the
>>    system and which client connects to which daemon.

Kelvin> In the current configuration, just one daemon with both spuser and spflooder all
Kelvin> running on a Linux box. We have tried other configurations with remotely connected
Kelvin> clients with the same results.

Kelvin> Interestingly, I've just noticed that spflooder -m 10000 seems to work. We had been
Kelvin> testing with fewer messages and 'spflooder -m 500' consistently comes up short with
Kelvin> spuser only receiving the first 498 of 500 messages.

Kelvin> Kelvin





More information about the Spread-users mailing list